From ISOGG Wiki
Admixture analysis (more properly known as biogeographical ancestry analysis) is a method of inferring someone's geographical origins based on an analysis of their genetic ancestry. An admixture analysis is one of the components of an autosomal DNA test. Companies which offer such tests include 23andMe, Family Tree DNA, Ancestry.com, the Genographic Project and BritainsDNA.com.
- 1 Admixture calculations
- 2 DTC providers admixture analysis
- 3 Analysis projects: send in
- 4 Analysis projects: Do it yourself (DIY)
- 5 Commercial Analysis
- 6 Blog posts
- 7 Further reading
- 8 References
- 9 See also
Admixture calculations provide genetic ancestry analysis to individuals tested for high-density single-nucleotide polymorphism (SNP) data. The different SNP extraction methods (mostly SNP-chips) need substantial overlap of extracted SNPs to allow meaningful comparisons. Admixture analysis usually builds ancestral components also called clusters by comparing a dataset of samples. Both the used datasets (regional, continental, worldwide) and the ancestral components (number, age) are very diverse depending on the used setup and analysis method. A new sample (not used in the dataset) is normally compared to the ancestral components by the calculation of the percentages. Additional tools allow also the prediction of ancestral populations. The analysis is strongly limited by the diversity and accuracy of the dataset, for example calculating an Asian individual with an Admixture tool based on an European dataset will not give meaningful results.
Accuracy and sophistication
Most calculators use a shared subset of the up to 0.7 million SNPs provided by Family Finder, AncestryDNA, 23andMe, etc. These are compared with publicly available datasets and the companies' own proprietary datasets. As can be seen from the Autosomal DNA testing comparison chart the accuracy and sophistication vary greatly and have not yet reached the quality desired for accurate genetic genealogy research. The public dbSNP (Build 137) database contains ca. 45 million human SNPs, and comprehensive whole-genome sequencing (WGS) of all human populations could substantially increase that number and allow much better calculators.
DTC providers admixture analysis
Included for everyone who has been tested by the following companies. For further details see Autosomal DNA testing comparison chart
23andMe - Ancestry Composition
The Ancestry Composition feature offers a map view which displays one's ancestral components from various regions of the world as of 500 years ago, a split view for those who also have one or both parents who have been tested by 23andMe, and a breakdown by chromosome. Three settings are available: conservative, standard, and speculative. Overall accuracy is reasonably good, but predictions in Europe are still not optimal, particularly in the speculative mode. Ancestry Finder provides a breakdown of one's ancestry by country.
Family Tree DNA - Population Finder
Population Finder was the first incarnation of the admixture analysis provided with Family Tree DNA's "Family Finder" test. It was replaced by a new feature known as MyOrigins in May 2014. Population Finder used principal component analysis (PCA) to estimate biogeographical percentages of autosomal DNA. The population samples used in the analysis were continental groups (Africa, America, East Asia, Europe, Middle Eastern, Oceania, and South Asia). The analysis did not include the X-chromosome. For historical details of the test see Understanding results: Population Finder in the Internet Archive. The Population Finder analysis was relatively non-specific, particularly for people with European Ancestry.
For an explanation of the workings of Population Finder and the meaning of the Middle Eastern percentages seen in many Population Finder results see the guest blog post by Doug McDonald biogeograpical analysis.
AncestryDNA - Genetic Ethnicity
For background on the AncestryDNA Ethnicity Estimates see the AncestryDNA Ethnicity Estimates White Paper.
Genographic Project - Who Am I
Since a relatively limited number of autosomal SNPs are available in the Geno 2.0 data for analysis, the biogeographical ancestry analysis is somewhat limited relative to other similar tools, particularly relative to Ancestry Composition. The two closest reference populations are given for each person who is tested. However, these predictions, particularly the second closest reference population, are frequently inaccurate.
BritainsDNA - All My Ancestry
This test provides a biogeograpical analysis, chromosome painting, and principal components analysis. A somewhat limited number of autosomal SNPs are included in the All My Ancestry dataset which limits the specificity of the biogeographical ancestry analysis relative to 23andMe's Ancestry Composition. However, this is the only company doing autosomal DNA testing that offers chromosome painting and principal components analysis other than 23andMe.
Analysis projects: send in
McDonald's BGA project by Doug McDonald
Doug McDonald does two types of free tests. One is like 23andme's "Advanced Global Similarity", except that he does more "dimensions". For people with ancestry outside Europe four of these are shown. For pure Europeans his world graph is essentially identical to 23andMe's so instead he shows a European graph, which includes (at lower right) the Adygei, a tribe living on the eastern shores of the Black Sea. The higher dimensions do not give additional information for pure Europeans so they are not shown. The results are sent to participants on graphs as .png files. Doug also does quantitative tests. These come in three flavors, first without South Asia (represented by Pakistan) and the Middle East, second with South Asia, and finally with all three, as comparison panels. See the ISOGG Wiki page on McDonald's BGA project for the qualifying criteria.
Dodecad Ancestry Project by Dienekes Pontikos
See http://dodecad.blogspot.com for details. Also see the summary written on November 7, 2010 on his anthropology blog. This analysis is currently closed to participants, but Dienekes says that he "may or may not process data from relatives, or non-target groups that was already sent to me and that was not assigned a DOD number." Contact him directly at to see if he might be willing to accept your data at some future point in time.
Eurogenes analysis by David Wesolowski
David does free analysis of raw data files from both 23andMe and FTDNA's Family Finder using the programs ADMIXTURE, BEAGLE, PLINK and ADMIXMAP. Results are distributed as Excel spreadsheets and as .png files. See http://eurogenes.blogspot.com/ and http://www.bga101.blogspot.com for background. Also see http://www.23andme.com/you/community/thread/5182. Information on how to interpret the results may be found at archive of http://bga101.blogspot.com/2010/10/brief-guide-to-output-youre-seeing.html. If you are interested in participating in his project contact him at .
Anders does a free analysis of admixture for people of Scandinavian ancestry who have been tested by 23andMe. Participants must have their primary ancestry from Norway, Sweden or Finland. The raw 23andMe data files are analyzed using the program ADMIXTURE and the ancestry is presented in a STRUCTURE like graph. For additional background see http://archiver.rootsweb.ancestry.com/th/read/GENEALOGY-DNA/2010-10/12863480 59 and http://archiver.rootsweb.ancestry.com/th/read/GENEALOGY-DNA/2010-11/12891438 80. If you are interested in participating in his project contact him at .
Magnus Ducatus Lituaniae Project by Verenich, Kull
A biogeographical analysis project for the territories of the former Grand Duchy of Lithuania. Admin: Vadim Verenich Co-admin: Leon Kull. See the Magnus Ducatus Lituaniae Project blog for further details.
Analysis projects: Do it yourself (DIY)
GEDmatch online admixture applications
This free onine service was created by John Olson and Curtis Rogers under www.gedmatch.com. The big data sizes to transfer and heavy usage sometimes leads to server problems; donations are welcome to help funding the service. Various admixture (ethnicity or deep ancestry) tools are included:
- 4-Ancestors Oracle December 2012
Dienekes Pontikus published the Do-It-Yourself Dodecad tool free of charge for non-commercial use. DIYDodecad can do admixture analysis on Windows or Linux 32bit/64bit machines. The analysis is carried out based on calculator files and appropriately standardized autosomal SNP raw data. There is an interesting admixture calculator which gives percentages for the different population clusters.
- v1.0 July 2011: Dodecad v3 calculator included, Dodecad Oracle possible
- v2.0 August 2011: new features including by-chromosome and by-segment ancestry analysis, etc.
- v2.1 September 2011: allows incomplete genotype files to be used and not only the Illumina platforms
Standardize raw data
To convert your data from the company-specific format to a common format the R software is required, which can be downloaded and installed from http://www.r-project.org/. Follow the instructions in the DIYDodecad readme.txt
- Geno 2.0 patch: new standardize.r and hgdp.base.txt, November 2012
Different calculator files from various projects are published regularly. Numbers in the calculator file usually describe the number of population clusters. You should look at their blogs for new versions:
- Dodecad Project: Admixture, Oracle
- globe13, globe13 participant results, globe13 files, globe10, globe 10 files, October 2012
- weac2 (West Eurasian cline) - weac2 files, K10a - K10a files, June 2012
- K7b, K12b, Oracle K12b - K7b files, K12b files, Oracle K12b file, January 2012
- K12a, world9, Oracle K12a, Euro-DNA, Eurasia7, Africa9, weac, BAT, Euro7, Oracle v1, 2011
- Eurogenes Project:
- MDLP Project:
Geographic Population Structure
- Geographic Population Structure A program provided by Eran Elhaik and Tatiana Tatarinova
SPatial Ancestry analysis (SPA)
Method for predicting ancestry or where an individual is from.
- SPA homepage cs.ucla.edu Version 1.13 April 2013, Eurogenes review November 2012 and March 2012
- Eurogenes SPA "model" files November 2012
Little program to view SNP data, and see how the data compares to other populations and regions of the world.
- SnpMap version 1.0.4, June 2011
ADMIXTURE and PLINK
Razib Khan has provided tutorials for users who wish to perform DIY analyses on their autosomal DNA results using the software programs ADMIXTURE and PLINK:
- Eurasia ADMIXTURE supervised and unsupervised, 16 March 2011
- Analyzing ancestry with ADMIXTURE step by step, 14 March 2011
- Using your 23andMe data in PLINK, 7 January 2013
- Using your 23andMe data: exploring with MDS, 8 January 2013.
Total Genomic Ancestry Classification
This test compares your 23andMe or deCODEme data with that of over 1600 people from all over the world. The test provides a detailed analysis of which populations and individuals a person is genetically related to. The cost is $99 per analysis. See archive of http://www.ethnoancestry.com/TGAC_detail.html for details. For a list of the reference populations see http://www.ethnoancestry.com/TGAC_Populations.
The commercial company DNA Tribes offer a "geographical 'deep ancestry' analysis that can be performed based on your genotype raw data from any of several SNP microarray tests". Further information can be found at www.dnatribes.com/snp.html.
See also the blog post My DNA Tribes results by Aidan Byrne, 15 March 2013.
- Making the best of what's not so good by Judy G Russell, The Legal Genealogist, 22 February 2015.
- Understanding BGA testing DNA Genealogical Experiences and Tutorials blog, 3 November 2012.
- Racing to the wrong conclusion Genealogy for the Everyman blog, 9 February 2013. The article provides a good summary of the problem of assigning arbitrary labels to "races".
- Understanding correlations and debunking misconceptions in DNA genealogy by Steve Handy. DNA Genealogical Experiences and Tutorials blog, 29 May 2013.
- Ethnicity results - true or not? by Roberta Estes, DNAeXplained, 4 October 2013.
- How long ago did African ancestry enter my family tree? by Henry Louis Gates Jr and Kasia Bryc. The Root, 10 July 2015.
- The rise of the genome bloggers Nature, 15 December 2010, 468, pp 880-881.
- Figure 1 Venn diagram, Francioli et al 2014, Whole-genome sequence variation, population structure and demographic history of the Dutch population, Nature Genetics, http://dx.doi.org/10.1038/ng.3021