Fixation Index and Genetic Distance Between Human Populations

85% of the genetic variation in humans is found within populations.  According to Richard Lewontin in 1972, and every graduate student in anthropology ever since, race can therefore be of no genetic or taxonomic significance. There are several faults with this misleading statistic:

  • This same kind of genetic overlap exists between many sibling species that are nonetheless distinct in anatomy and behaviour.
  • Genes vary a lot in adaptive value.
  • Within-population variation is not comparable to between-population variation.
  • Population differences are more sharply defined if several gene variants are compared simultaneously.

Recognition of all this become known as Lewontin’s Fallacy [expanded on in further detail here].

This ratio of genetic distance, as measured by fixation index (FST), is actually not very meaningful:

These limitations on FST are demonstrated algebraically and in the context of analyzing dinucleotide repeat allele frequencies for a set of eight loci genotyped in eight human groups and in chimpanzees. In our analyses, estimates of FST fail to identify important variation. For example, when the analysis includes only humans, FST = 0.119, but adding the chimpanzees increases it only a little, FST = 0.183. By relaxing the underlying statistical assumptions, the results for chimpanzees become consistent with common knowledge, and we see a richer pattern of human genetic diversity. Some human groups are far more diverged than would be implied by standard computations of FST, while other groups are much less diverged.
[Human Genetic Diversity and the Nonexistence of Biological Races]

Here, using eight racial groups, the average between population FST was estimated at ~11%, yet when they added chimpanzees into the mix, the average between population FST between these samples increased only to ~18%. Invariably, this method of estimating genetic distance provides little information about the degree to which genes contribute to between-population differences. To illustrate this further:

  • Table 4 [above]: when comparative pair-wise genetic distances are measured between various species and subspecies, the genetic distance between modern Sub-Saharan Africans (Bantus) and Europeans (English) was estimated at ~24%; much greater than the modern human and the extinct archaic Neanderthal genetic distance of just ~8%; greater than the distance between classified individual species, Chimps and Bonobos; between Western and Eastern Gorillas; and slightly greater still than between modern humans and homo erectus, as determined by a mean of several estimates.

Similarly, the average FST between various Asian dog breeds is ~0.154%, which is nearly identical to the average FST between various human populations at 0.155, as established by Lewontin, who analysed only five individual human genomes to establish this mean between human population genetic distance. Moreover, for dog breeds at large, a majority of 70% of genetic variation is within breeds and 30% between breeds.

There’s reason for this genetic overlap between non-Africans and Neanderthals, of course, as a small amount of interbreeding between Neanderthals and humans outside of Africa lead to the introgression of Neanderthal genes into modern humans. Present-day people of non-African ancestry trace an average of about 2 percent of their genomes to Neanderthals, in an introgressed sequence that spans 20% of the Neanderthal genome. Taken together these adapted genes shared with Neanderthals don’t tell us much about the degree of genetic relatedness, unless we compare several gene loci simultaneously.

Modern humans have existed longer in Africa than on any other continent. They have therefore accumulated more genetic variability. But most of this variability is of little or no adaptive value. Much of it is ‘junk.’ Whereas upon humans leaving the tropics, findings indicate that human evolution greatly accelerated, with at least 7% of the human genome changing over the last 40,000 years – human populations have been undergoing divergent, not convergent evolution.

Clearly, genes vary a lot in adaptive value and intra-population genetic variance is simply not comparable to inter-population variation.

Related: The Genetic Clustering of Mankind


Human Genetic Distance - Europeans

Principal component analysis (PCA) is a tool for exploring multilocus population genetic data and extracting information from genetic markers. It’s used for representing high-dimensional data, for example, individuals or populations, in a smaller number of dimensions. It’s also used to summarise large-scale genomic surveys, by providing covariates that might correct for population structure in genomewide association studies.

Population Structure and Eigenanalysis

This study tested the significance of the eigenvalues from a PCA of genetic markers to infer population stratification. According to the analysis in the paper, populations with fixation indices (FST) as low as .0001 can be resolved with current technology. This was confirmed in later studies:

Investigation of the fine structure of European populations with applications to disease association studies:

Table 1 (above this post):

Fst statistics calculated between each pair of countries: Spain (Sp), France (Fr), Belgium (Be), Sweden (Sw), Norway (No), Germany (Ge), Romania (Ro), Czech (Cz), Slovakia (Sl), Hungary (Hu), Poland (Po), Russia (Ru), and the four HapMap cohorts CEU, CHB, JPT and YRI; Utah Americans (largely of NW European ancestry), Chinese from Beijing, Japanese from Tokyo, & Yoruba samples from Nigeria respectively.

This table shows that typical FST between northern and southern Europe is about .006, between Europe and East Asia about .1 and between Europe and Nigeria about .14. The FST between France and Spain is .0008, whereas between Nigeria and Japan it is about .19.

Another implication is that these methods are sensitive. For example, given a 100,000 marker array and a sample size of 1,000, then the BBP threshold for two equal subpopulations, each of size 500, is FST = .0001. An FST value of .001 will thus be trivial to detect. To put this into context, we note that a typical value of FST between human populations in Northern and Southern Europe is about .006 [15]. Thus, we predict: most large genetic datasets with human data will show some detectable population structure. [Population Structure and Eigenanalysis]

This prediction has been confirmed in later studies of genetic substructure within closely related populations, e.g., Europeans:

Within the European samples analysed, hundreds of statistically significant PCA vectors were identified. Even then, a single vector only accounted for a few percent of total within-European population variance.

External image
In this PCA plot [above] European individuals are represented by individual points, the axes are two principal components in the space of genetic variation. Each colour corresponds to individuals of different European ancestry. Two populations with trivial genetic differences, such as Norwegians and Swedes, who have an FST value of .001 as established in the previous table, could be detected with 90 percent accuracy:

In conclusion, we have shown that using PCA techniques it is possible to detect fine-level genetic variation in European samples. The genetic and geographic distances between samples are highly correlated, resulting in a striking concordance between the scatter plot of the first two components from a PCA of European samples and a geographic map of sample origins. We have shown how this information can be used to predict the origin of unknown samples in a rapid, precise and robust manner, and that this prediction can be performed without requiring access to the individual genotype data on the original samples of known origin. [Investigation of the fine structure of European populations with applications to disease association studies]

Evidently, at this sensitive scale, the chance of mis-identifying a European as an African or E. Asian is an impossibility with accurate use of data.

Pair-wise Fst (genetic distance) between European samples:Pairwise FST values (multiplied by 10,000) between European populations: