Autosomal DNA statistics

From ISOGG Wiki
Jump to: navigation, search

An understanding of autosomal DNA statistics is helpful when trying to understand results from an autosomal DNA test. Autosomal DNA is inherited from both parents. It is randomly shuffled up in a process called recombination and the percentage of autosomal DNA is diluted with each new generation.

Autosomal DNA tests for finding cousins and verifying relationships for genetic genealogy purposes are offered by 23andMe, AncestryDNA and Family Tree DNA (the Family Finder test). For comparisons of the different services see Tim Janzen's autosomal DNA testing comparison chart.

Contents

Simple mathematical average of sharing

There are two methods of calculating the percentages of autosomal DNA shared by two individuals. Both methods give the same results except in the cases of parent/child comparisons, full siblings, double cousins, or any two individuals who are each related to the other through both parents.

The autosomal DNA of two related individuals will be half-identical in regions where each has inherited the same DNA from one parent, and ultimately from one common ancestor. In the cases of siblings and double cousins, their autosomal DNA will be fully identical in regions where each has inherited the same DNA from both parents, and ultimately from two common ancestors. Full siblings are half-identical on regions where each has inherited the same DNA from exactly one parent and fully identical on regions where each has inherited the same DNA from both parents.

Method I

The first method of calculating percentages (displayed by 23andMe) expresses the aggregate length of the shared segments (i.e. the aggregate length of the half-identical regions, where there is one shared segment, plus twice the aggregate length of the fully identical regions, where there are two shared segments, one paternal and one maternal) as a percentage of the aggregate length of the paternal and maternal autosomes. Using this method, full siblings (excluding identical twins), who are expected to be half-identical on 50% of their autosomal DNA and fully identical on a further 25% of their autosomal DNA will on average appear to have 50% shared.

Method II

The second method of calculating percentages (to which those relying on FTDNA or GEDmatch must resort) expresses the aggregate length of the half-identical (or better) regions as a percentage of the aggregate length of one set of autosomes (paternal or maternal). Using this method, full siblings will on average appear to have 37.5% shared. Whenever there are fully identical regions, the calculated percentages will be smaller than for Method I as half-identical and fully identical regions cannot be distinguished from the available data and must be given equal weight in the calculation.

The first column in the table below shows the average percentages for different relationships and methods of calculation. The calculations assume that every child gets 50% from its mother and 50% from its father and in turn 25% from each of its four grandparents. The actual percentages vary from the average in individual cases. For example, a person might share 27% of his DNA with one nephew and only 23% with another. Because of the random way that autosomal DNA is inherited, third, fourth and more distant cousins will not necessarily have any detectable half-identical regions. According to Family Tree DNA's figures there is a 90% chance that third cousins will share enough DNA for the relationship to be detected, but there is only a 50% chance that you will share enough DNA with a fourth cousin for the relationship to be identified.

The degree of sharing is also displayed by the DNA companies in units of genetic distance known as centiMorgans (cMs), although in practice the total number of shared centiMorgans is less significant than the number and lengths of individual shared segments. The second column in the table below shows the aggregate lengths in cM of the half-identical (or better) regions shared on average by various pairs of relatives. It assumes that the aggregate length of each set of autosomal chromosomes is 3400cM, and thus that each individual inherits 6800cM of autosomal DNA, 3400cM from each parent. Different DNA companies use different conventions, so that the actual cM figures, as displayed by 23andMe's Family Inheritance: Advanced, FTDNA and GEDmatch, may be slightly different from these round numbers, even before allowing for random variation around the averages in individual cases.

The reason for the different results from Method I and Method II in the case of siblings and double cousins is that the cM lengths displayed by FTDNA and in the free GedMatch utility (and, indeed, 23andMe's own Family Inheritance: Advanced) do not distinguish between half-identical and fully identical regions. The best place to see the distinction between half-identical regions and fully-identical regions is in the optional graphical output of the one-to-one comparisons at GEDmatch.com, where FIRs are displayed in green and HIRs are displayed in yellow. It is also possible to see the fully identical regions at 23andMe by using the Family Traits chromosome browser (accessed via the Family and Friends menu).

When using Family Finder data, the percentages based on Method II can be calculated from the cM lengths by dividing the displayed Shared cM by 68.

Note that the FTDNA figures exclude the X-chromosome cMs but the 23andMe figures include them. Males have one X-chromosome and females have two X-chromosomes. If you want to include the X-chromosome in the calculations, then instead of dividing by 68, divide by 68.81065 when combining the atDNA with the X-chromosome. Note that the expected shared percentages of X-DNA depend not only on the genealogical relationship between two people, but also on the numbers of males and females in the two paths to their common ancestor.

23andMe include the X-chromosome in their calculations, so their cM figures will be higher than those provided by FTDNA. 23andMe made adjustments to the cM count in June 2013 so the number of cMs will vary slightly depending on when the test was taken.

  • For females using 23andMe data prior to June 2013, there were 7494.8cMs when combining the paternal and maternal autosomal DNA and the two X-chromosomes per Family Inheritance: Advanced.
  • For females using 23andMe data after June 2013 there were 7438.6cMs when combining the paternal and maternal autosomal DNA and the two X-chromosomes per Family Inheritance: Advanced.
  • There are 7074.6 autosomal cMs per 23andMe.
  • For males using 23andMe data there are 7256.8 cMs when combining the atDNA with the single X-chromosome.

Note that AncestryDNA do not provide information on the lengths of half-identical (or better) regions in either centiMorgans or percentages. However, AncestryDNA customers can upload their raw data to the free GedMatch utility in order to extract the necessary cM data for making comparisons and to check the relationship predictions. David Pike's tools can also be used.

Average autosomal DNA shared by pairs of relatives, in percentages and centiMorgans
 % shared cM half-identical (or better) Relationship Notes
100% (Method I)/50% (Method II) 3400.00 Identical twins (monozygotic twins) Fully identical everywhere.[1]
50% 3400.00 Parent/child Half-identical everywhere
50% (Method I)/37.5% (Method II) 2550.00 Full siblings Half-identical on 50%/1700cM and fully identical on a further 25%/850cM.
25% 1700.00 Grandparent/grandchild, aunt-or-uncle/niece-or-nephew, half-siblings
25% (Method I)/23.4375% (Method II) 1593.75 Double first cousins Half-identical on 21.875%/1487.5cM and fully identical on a further 1.5625%/106.25cM
12.5% 850.00 Greatgrandparent/greatgrandchild, first cousins, greatuncle-or-aunt/greatnephew-or-niece, half-uncle-or-aunt/half-nephew-or-niece
6.25% 425.00 First cousins once removed, half first cousins
3.125% 212.50 Second cousins, first cousins twice removed
1.563% 106.25 Second cousins once removed, half second cousins
0.781% 53.13 Third cousins, second cousins twice removed
0.391% 26.56 Third cousins once removed
0.195% 13.28 Fourth cousins
0.0977% 6.64 Fourth cousins once removed
0.0488% 3.32 Fifth cousins
0.0244% 1.66 Fifth cousins once removed
0.0122% 0.83 Sixth cousins
0.0061% 0.42 Sixth cousins once removed
0.00305% 0.21 Seventh cousins
0.001525% 0.10 Seventh cousins once removed
0.000763% 0.05 Eighth cousins

The chart below (courtesy Dimario, Wikimedia Commons) shows the average amount of autosomal DNA inherited by all close relations up to the third cousin level.

Cousin tree (with genetic kinship).png

Ranges of sharing percentage

Figures from 23andMe's Relative Finder:

  • Parent/child: 47.54 (for father/son pairs, who do not share the X-chromosome) to ~50%
  • 1st cousins: 7.31-13.8
  • 1st cousins once removed: 3.3-8.51
  • 2nd cousins: 2.85-5.04
  • 2nd cousins once removed: .57-2.54
  • 3rd cousins: ca .3-2.0
  • 3rd cousins once removed: .11-1.32
  • 4th and more distant cousins: .07-.5

Shared SNPs

Figures from 23andMe Compare Genes function (from Tim Janzen's data):

  • Parent-child pairs share between 83.94% and 84.20% of SNPs (50% of DNA in common)
  • Siblings share between 83.81% and 87.47% of SNPs (50% of DNA in common)
  • Uncle/aunt-niece/nephew pairs share between 78.48% and 79.57% of SNPs (25% of DNA in common)
  • Grandparent-grandchild pairs share between 77.96% and 80.59% of SNPs (25% of DNA in common)
  • First cousins and great uncle/great aunt-grandniece/grandnephew pairs share 75.78% and 77.03% of SNPs (12.5% of DNA in common)
  • First cousins once removed share ca 75.5% of SNPs (6.25% of DNA in common)
  • Second cousins and first cousins twice removed share ca 75% of SNPs (3.125% of DNA in common)
  • Unrelated people of European descent share 73-74.6% of SNPs

Identical by descent segments

It is important to remember that we do not inherit DNA segments from every genealogical ancestor. At ten generations we have approximately 1024 ancestors although there is generally some overlap as a result of pedigree collapse. While all these ancestors can potentially be documented in our genealogical tree we only inherit segments of DNA from a small subset of these ancestors. Luke Jostins found that "The probability of having DNA from all of your genealogical ancestors at a particular generation becomes vanishingly small very rapidly; there is a 99.6% chance that you will have DNA from all of your 16 great-great grandparents, only a 54% [chance] of sharing DNA with all 32 of your G-G-G grandparents, and a 0.01% chance for your 64 G-G-G-G grandparents. You only have to go back 5 generations for genealogical relatives to start dropping off your DNA tree."[2]

In addition it is important to note that not all of the matching segments are true matches (identical by descent). Some segments, especially the smaller segments, will be false positive matches (identical by state).

Statistics categorized by genealogical relationship

In order to help people who have taken an autosomal DNA test gain greater insight into the genealogical relationships implied by the resultant data Tim Janzen has created three charts that provide statistical information in various categories. The charts provide statistics on close relatives, distant endogamous relatives and distant non-endogamous relatives. The charts were originally designed for use with 23andMe data but now also incorporate data from FTDNA's Family Finder test. The charts are organized by the degree of relationship, with the most closely related people (parents and children, full siblings) being listed at the top and more distant cousins being listed at the bottom. The statistics are based on information from real people who have been tested by 23andMe and Family Tree DNA and who have a known genealogical relationship to someone else who has also been tested by the same company. The charts also include information on the median and the average number of shared cMs for people who are related to each other from the first cousin once removed level of relationship to the 5th cousin level of relationship. The charts can be downloaded from Anabaptist Genetic Genealogy website.

An unidentified author has also provided a spreadsheet on DNA Inheritance Statistics to which anyone can add their data. The spreadsheet can be found here.

Blog posts

Resources

Charts and tools

Resources from FTDNA and 23andMe

Scientific papers

References

  1. Tiny differences between identical twins can now be detected by next generation sequencing. See: Weber-Lehman et al 2014. Finding the needle in the haystack: Differentiating "identical" twins in paternity testing and forensics by ultra-deep next generation sequencing. Forensic Science International: Genetics; 9: 42-46. See also the editorial by Bruce Budowle in Investigative Genetics: Molecular genetic investigative leads to differentiate monozygotic twins.
  2. Jostins L. How many ancestors share our DNA? Genetic inference blog. 11 November 2009.

See also