Identical by descent
Identical by descent (IBD) is a term used in genetic genealogy to describe a matching segment of DNA shared by two or more people that has been inherited from a recent common ancestor. The segments are considered to match if all the alleles on a paternal or maternal chromosome are identical (barring rare mutations and genotyping errors). Two matching half-identical regions (HIRs) (see below) or full-identical regions (FIRs)which meet minimum threshold conditions can be considered identical by descent. Being identical by descent is contrasted to being identical by state (IBS).
IBD segments can be measured in centiMorgans (a unit of genetic distance) or in megabases (a unit of physical distance). The three major testing companies (23andMe, AncestryDNA and Family Tree DNA) all now report segment sizes in centiMorgans. 23andMe round the segment length to the nearest tenth of a centiMorgan and round the segment start and end co-ordinates to the closest millionth base pair to reflect the uncertainty in the exact locations of the segment boundaries. AncestryDNA originally used megabases for their matching algorithms but converted to centiMorgans in about January 2014. Both 23andMe and Family Tree DNA provide information on the matching segments which can be downloaded into a spreadsheet. The segment data is not currently provided by AncestryDNA.
Identity by descent can be considered on various timescales. In population genetics theory all individuals have common ancestry in the distant past. For the purposes of genetic genealogy the focus is on detecting IBD segments within a genealogical timeframe (effectively within the last ten generations) where there is a possibility of identifying the common ancestor through documentary records. Any given pair of individuals is related through many common ancestors, though many of these relationships will be too distant to result in detectable IBD segments. If the two individuals have ancestors from the same geographical region they might have many recent common ancestors, but many of the relationships will not result in IBD sharing, and there might only be one or two detectable segments inherited from a subset of the common ancestors. In a study of a European subset of the Population Reference Sample (POPRES) dataset it was estimated that for the most part IBD blocks longer than 4 cM come from 500 to 1,500 years ago, and blocks longer than 10 cM are within the last 500 years.
Consecutive SNP results for a short segment of DNA may be half-identical in two individuals when in actuality the DNA sequences are not identical (IBS) because the SNPs match on opposing chromosomes. Therefore we must resort to the use of statistics to determine whether two half-identical (apparently matching) segments are likely to actually be identical (neglecting rare errors and mutations). A long consecutive string of half-identical SNP results (typically about 5 cM / 700 SNPs, depending on the test's error rate and other factors), is required before one can infer that two matching DNA segments are probably identical by descent.
It is problematical to use small segments under 5 cM for genealogical matching purposes. Durand et al (2014) analysed phased data from 2,952 father-mother-child trios in the 23andMe dataset and identified a false positive rate of over 67% for 2–4 cM segments. The error rate is likely to be much higher in unphased data. Currently AncestryDNA are the only company to phase the data before doing the matching.
The terms IBD and IBS are more relevant to the results of SNP microarray testing than to results of whole-genome sequencing, because microarray testing provides so much less information per centiMorgan of DNA. Microarray test results have an additional complexity since they report on both copies of the chromosome, but the results are not phased (that is, it is unknown which nucleotide is on which copy of the chromosome). Thus if one person's SNP result is (CC), this could be at least "Half-Identical" to either (CC) or (CT) in a second person. A homozygous mismatch such as (CC) vs. (TT) would be required before one could say the results are *not* identical.
Thresholds for matches
Thresholds for length and number of mismatches (errors or mutations) are set by each testing company; these criteria must be met before the company will report that two individuals very likely inherited their matching segments from a common ancestor. For threshold details see Family Finder versus DNA Relatives - Thresholds for relationship matches.
AncestryDNA introduced a new matching system in November 2014. Detailed FAQs and a technical White Paper can be viewed by AncestryDNA testees. AncestryDNA assigns confidence levels depending on the approximate amount of shared centiMorgans:
|Confidence score||Approximate amount of sharing||Likelihood you and your match share a recent common ancestor within 5 or 6 generations|
|Extremely high||More than 30 centiMorgans||Virtually 100%|
|Very high||20-30 centiMorgans||99%|
|Good||6-12 centiMorgans||More than 50%|
|Moderate||6 centiMorgans or less||20-50%|
Note that the AncestryDNA database is 99% American, and it is not yet known if these ranges will apply in the same way to other populations.
Previously AncestryDNA set their threshold for matches at 5 megabases. In around January 2014 they subsequently changed to using centiMorgans and the threshold was changed to 5 cM, but the earlier matches were not rerun. The previous thresholds for other relationships at AncestryDNA are given here.
|cM||% IBD||% IBS|
Ranges of total centiMorgans of IBD segments expected, based on family relationship
- Parent/child: 3539-3748 centimorgans (cMs)
- 1st cousins: 548-1034 cMs
- 1st cousins once removed: 248-638 cMs
- 2nd cousins: 101-378 cMs
- 2nd cousins once removed: 43-191 cMs
- 3rd cousins: 43-ca 150 cMs
- 3rd cousins once removed: 11.5-99 cMs
- 4th and more distant cousins: 5-ca 50 cMs
- Parent/child: 23-29
- 1st cousins: 17-32
- 1st cousins once removed: 12-23
- 2nd cousins: 10-18
- 2nd cousins once removed: 4-12
- 3rd cousins: 2-6?
- 3rd cousins once removed: 1-4
- 4th and more distant cousins: 0-2
IBD accuracy of genetic tests and analysis methods
- SNP microarray testing (23andMe, Family Finder, AncestryDNA, Chromo2, Geno 2.0, etc.: see Autosomal DNA testing comparison chart): the accuracy depends on the number and type of extracted autosomal and X-chromosome SNPs. Generally more is better.
- Whole-genome sequencing (WGS) using next generation sequencing (NGS) technology, is not currently affordable for the genetic genealogy market, but is being used in academic studies: IBD tools are able to detect all 1st through 6th degree relationships and 55% of 9th through 11th degree relationships, a 5% to 15% increase in relationship detection compared to high-density microarray data.
- FTDNA defintion of identical by descent Family Tree DNA Learning Center
- Wheaton K. atDNA matches. Lesson 9 in the series "Beginners' Guide to Genetic Genealogy", Wheaton Surname Resources website, 2013.
- Turner A. "Satiable Curiosity: Identity Crisis: Identical by State or Identical by Descent?" Journal of Genetic Genealogy Fall 2011, Volume 7.
- Wikipedia article on identity by descent
- Bettinger B. Small matching segments - examining hypotheses. The Genetic Genealogist, 8 December 2014.
- Moore C. The folly of using small segments as proof in genealogical research. Part 1. Your Genetic Genealogist, 3 December 2014.
- Bettinger B. Small matching segments - friend or foe?. The Genetic Genealogist, 2 December 2014.
- Cooper K. When is a DNA segment match a real match? IBD or IBS or IBC?. Kitty Cooper's DNA Genealogy blog, 27 October 2014.
- Bettinger B. Finding genetic cousins - separating fact from fiction. The Genetic Genealogist, 15 October 2014. A preview of the methodology to be used to reduce the number of false positive matches at AncestryDNA.
- Rose K. The ABCs of DNA - IBD vs IBS vs mIBC. DNA Genealogy blog, 30 January 2012.
- Mount S. Genetic genealogy and the single segment. On Genetics blog 19 February 2011.
- Granka J. The DNA research and matching development life cycle. An explanation of the AncestryDNA matching process and their definitions of IBD and IBS.
- Speed D, Balding DJ (2014). Relatedness in the post-genomic era. Nature Reviews Genetics. Published online 18 November 2014 (subscription required).
- Durand EY, Eriksson N, McLean CY (2014). Reducing pervasive false positive identical-by-descent segments detected by large-scale pedigree analysis. Molecular Biology and Evolution 2014 doi: 10.1093/molbev/msu151. First published online: April 30, 2014. For a layman's summary see the 23andMe blog post:
- 23andMe scientists improve methods for finding relationships 23andMe blog, 12 May 2014.
- Li H, Glusman G, Hu H (2014) Relationship estimation from whole-genome sequence data. PLoS Genetics 2014 Jan 30;10(1):e1004144. Figure 1 indicates regions which are prone to excess IBD inference from WGS data.
- Hochreiter S (2013). HapFABIA: Identification of very short segments of identity by descent characterized by rare variants in large sequencing data. Nucleic Acids Research 2013 doi: 10.1093/nar/gkt1013.
- Browning SR, Browning BL (2012). Identity by descent between distant relatives: detection and applications. Annual Review of Genetics 2012; 46: 617-33. A useful review article defining the terminology and summarising the different methodologies used to infer IBD segments (subscription required).
- Thompson EA (2013). Identity by Descent: Variation in Meiosis, Across Genomes, and in Populations. Genetics 2013; 194(2).
- Henn BM, Hon L, Macpherson JM, Eriksson N, Saxonov S, et al (2012). Cryptic distant relatives are common in both isolated and cosmopolitan genetic samples. PLoS ONE 7(4): e34267.
- Hill, WG & Weir, BS (2011). Variation in actual relationship as a consequence of Mendelian sampling and linkage. Genetics Research, vol 93, no. 1, pp. 47-64. (See in particular Figure 5 which shows the distribution of actual genome sharing for different degrees of pedigree relationship.)
- Powell JE, Visscher PM, Goodard ME (2010). Reconciling the analysis of IBD and IBS in complex trait studies. Nature Reviews Genetics 11, 800-805 (November 2010).
- Visscher PM, Medland SE, Ferreira R et al (2006). Assumption-free estimation of heritability from genome-wide identity-by-descent sharing between full siblings. PLOS Genetics 2006 Mar; 2(3): e41. Epub 2006 Mar 24.
- Ralph P, Coop G (2013). The Geography of Recent Genetic Ancestry across Europe. PLOS Biology 11(5):e1001555.
- Durand EY, Eriksson N, McLean CY. Reducing pervasive false positive identical-by-descent segmentsdetected by large-scale pedigree analysis. Molecular Biology and Evolution advance access publication online 30 April 2014.
- Swayne A. DNA matching just got better. Ancestry blog. 19 November 2014.
- John Walden's research reported in a message to the Autosomal DNA Rootsweb list by Tim Janzen, 6 January 2012.
- See also the files on John Walden's website.
- Li et al 2014: Relationship Estimation from Whole-Genome Sequence Data, PLoS Genetics Jan 2014; 10(1): e1004144, http://dx.doi.org/10.1371%2Fjournal.pgen.1004144