Admixture analyses

From ISOGG Wiki
(Redirected from Admixture tests)
Jump to: navigation, search

Admixture analysis (more properly known as biogeographical analysis) is a method of inferring someone's geographical origins based on an analysis of their genetic ancestry. An admixture analysis is one of the components of an autosomal DNA test. Companies which offer such tests include 23andMe, Family Tree DNA, Ancestry.com, the Genographic Project, BritainsDNA.com and deCODE genetics.

Contents

Admixture calculations

Admixture calculations provide genetic ancestry analysis to individuals tested for high-density single-nucleotide polymorphism (SNP) data. The different SNP extraction methods (mostly SNP-chips) need substantial overlap of extracted SNPs to allow meaningful comparisons. Admixture analysis usually builds ancestral components also called clusters by comparing a dataset of samples. Both the used datasets (regional, continental, worldwide) and the ancestral components (number, age) are very diverse depending on the used setup and analysis method. A new sample (not used in the dataset) is normally compared to the ancestral components by the calculation of the percentages. Additional tools allow also the prediction of ancestral populations. The analysis is strongly limited by the diversity and accuracy of the dataset, for example calculating an Asian individual with an Admixture tool based on an European dataset will not give meaningful results.

Accuracy and sophistication

Most calculators use a shared subset of the up to 0.7 million SNPs provided by Family Finder, AncestryDNA, 23andMe, etc. These are compared with publicly available datasets and the companies' own proprietary datasets. As can be seen from the Autosomal DNA testing comparison chart the accuracy and sophistication vary greatly and have not yet reached the quality desired for accurate genetic genealogy research. The public dbSNP (Build 137) database contains ca. 45 million human SNPs, and comprehensive whole-genome sequencing (WGS) of all human populations could substantially increase that number and allow much better calculators.[1]

DTC providers admixture analysis

Included for everyone who has been tested by the following companies. For further details see Autosomal DNA testing comparison chart

23andMe - Ancestry Composition

The Ancestry Composition feature offers a map view which displays one's ancestral components from various regions of the world as of 500 years ago, a split view for those who also have one or both parents who have been tested by 23andMe, and a breakdown by chromosome. Three settings are available: conservative, standard, and speculative. Overall accuracy is reasonably good, but predictions in Europe are still not optimal, particularly in the speculative mode. Ancestry Finder provides a breakdown of one's ancestry by country.

Family Tree DNA - Population Finder

Population Finder was the first incarnation of the admixture analysis provided with Family Tree DNA's "Family Finder" test. It was replaced by a new feature known as MyOrigins in May 2014. Population Finder used principal component analysis (PCA) to estimate biogeographical percentages of autosomal DNA. The population samples used in the analysis were continental groups (Africa, America, East Asia, Europe, Middle Eastern, Oceania, and South Asia). The analysis did not include the X-chromosome. For historical details of the test see Understanding results: Population Finder in the Internet Archive. The Population Finder analysis was relatively non-specific, particularly for people with European Ancestry.

For an explanation of the workings of Population Finder and the meaning of the Middle Eastern percentages seen in many Population Finder results see the guest blog post by Doug McDonald biogeograpical analysis.

AncestryDNA - Genetic Ethnicity

The Genetic Ethnicity Summary consistently overestimates the Central European and Scandinavian ancestral components for people whose ancestors were from the British Isles. The ancestral component from the British Isles is overestimated for people whose ancestors were from continental Europe. Overall, the European ancestry predictions tend to be inaccurate.

Genographic Project - Who Am I

Since a relatively limited number of autosomal SNPs are available in the Geno 2.0 data for analysis, the biogeographical ancestry analysis is somewhat limited relative to other similar tools, particularly relative to Ancestry Composition. The two closest reference populations are given for each person who is tested. However, these predictions, particularly the second closest reference population, are frequently inaccurate.

BritainsDNA - All My Ancestry

This test provides a biogeograpical analysis, chromosome painting, and principal components analysis. A somewhat limited number of autosomal SNPs are included in the All My Ancestry dataset which limits the specificity of the biogeographical ancestry analysis relative to 23andMe's Ancestry Composition. However, this is the only company doing autosomal DNA testing that offers chromosome painting and principal components analysis other than 23andMe.

Analysis projects: send in

McDonald's BGA project by Doug McDonald

Doug McDonald does two types of free tests. One is like 23andme's "Advanced Global Similarity", except that he does more "dimensions". For people with ancestry outside Europe four of these are shown. For pure Europeans his world graph is essentially identical to 23andMe's so instead he shows a European graph, which includes (at lower right) the Adygei, a tribe living on the eastern shores of the Black Sea. The higher dimensions do not give additional information for pure Europeans so they are not shown. The results are sent to participants on graphs as .png files. Doug also does quantitative tests. These come in three flavors, first without South Asia (represented by Pakistan) and the Middle East, second with South Asia, and finally with all three, as comparison panels. See the ISOGG Wiki page on McDonald's BGA project for the qualifying criteria.

Dodecad Ancestry Project by Dienekes Pontikos

See http://dodecad.blogspot.com for details. Also see the summary written on November 7, 2010 on his anthropology blog. This analysis is currently closed to participants, but Dienekes says that he "may or may not process data from relatives, or non-target groups that was already sent to me and that was not assigned a DOD number." Contact him directly at to see if he might be willing to accept your data at some future point in time.

Eurogenes analysis by David Wesolowski

David does free analysis of raw data files from both 23andMe and FTDNA's Family Finder using the programs ADMIXTURE, BEAGLE, PLINK and ADMIXMAP. Results are distributed as Excel spreadsheets and as .png files. See http://eurogenes.blogspot.com/ and http://www.bga101.blogspot.com for background. Also see http://www.23andme.com/you/community/thread/5182. Information on how to interpret the results may be found at archive of http://bga101.blogspot.com/2010/10/brief-guide-to-output-youre-seeing.html. If you are interested in participating in his project contact him at .

Admixture analysis for Scandinavians by Anders Pålsen

Anders does a free analysis of admixture for people of Scandinavian ancestry who have been tested by 23andMe. Participants must have their primary ancestry from Norway, Sweden or Finland. The raw 23andMe data files are analyzed using the program ADMIXTURE and the ancestry is presented in a STRUCTURE like graph. For additional background see http://archiver.rootsweb.ancestry.com/th/read/GENEALOGY-DNA/2010-10/12863480 59 and http://archiver.rootsweb.ancestry.com/th/read/GENEALOGY-DNA/2010-11/12891438 80. If you are interested in participating in his project contact him at .

Magnus Ducatus Lituaniae Project by Verenich, Kull

A biogeographical analysis project for the territories of the former Grand Duchy of Lithuania. Admin: Vadim Verenich Co-admin: Leon Kull. See the Magnus Ducatus Lituaniae Project blog for further details.

Analysis projects: Do it yourself (DIY)

GEDmatch online admixture applications

This free onine service was created by John Olson and Curtis Rogers under www.gedmatch.com. The big data sizes to transfer and heavy usage sometimes leads to server problems; donations are welcome to help funding the service. Various admixture (ethnicity or deep ancestry) tools are included:

DIYDodecad

Dienekes Pontikus published the Do-It-Yourself Dodecad tool free of charge for non-commercial use. DIYDodecad can do admixture analysis on Windows or Linux 32bit/64bit machines. The analysis is carried out based on calculator files and appropriately standardized autosomal SNP raw data. There is an interesting admixture calculator which gives percentages for the different population clusters.

Versions

  • v1.0 July 2011: Dodecad v3 calculator included, Dodecad Oracle possible
  • v2.0 August 2011: new features including by-chromosome and by-segment ancestry analysis, etc.
  • v2.1 September 2011: allows incomplete genotype files to be used and not only the Illumina platforms

Standardize raw data

To convert your data from the company-specific format to a common format the R software is required, which can be downloaded and installed from http://www.r-project.org/. Follow the instructions in the DIYDodecad readme.txt

Calculator files

Different calculator files from various projects are published regularly. Numbers in the calculator file usually describe the number of population clusters. You should look at their blogs for new versions:

Geographic Population Structure

SPatial Ancestry analysis (SPA)

Method for predicting ancestry or where an individual is from.

SnpMap

Little program to view SNP data, and see how the data compares to other populations and regions of the world.

  • SnpMap version 1.0.4, June 2011

ADMIXTURE and PLINK

Razib Khan has provided tutorials for users who wish to perform DIY analyses on their autosomal DNA results using the software programs ADMIXTURE and PLINK:

Commercial Analysis

Total Genomic Ancestry Classification

This test compares your 23andMe or deCODEme data with that of over 1600 people from all over the world. The test provides a detailed analysis of which populations and individuals a person is genetically related to. The cost is $99 per analysis. See archive of http://www.ethnoancestry.com/TGAC_detail.html for details. For a list of the reference populations see http://www.ethnoancestry.com/TGAC_Populations.

DNA Tribes

The commercial company DNA Tribes offer a "geographical 'deep ancestry' analysis that can be performed based on your genotype raw data from any of several SNP microarray tests". Further information can be found at www.dnatribes.com/snp.html.

See also the blog post My DNA Tribes results by Aidan Byrne, 15 March 2013.

Blog posts

Further reading

References

  1. Figure 1 Venn diagram, Francioli et al 2014, Whole-genome sequence variation, population structure and demographic history of the Dutch population, Nature Genetics, http://dx.doi.org/10.1038/ng.3021

See also