Identical by descent
Identical by descent (IBD) is a term used in genetic genealogy to describe a matching segment of DNA shared by two or more people that has been inherited from a recent common ancestor. The segments are considered to match if all the alleles are identical (barring rare mutations and genotyping errors). Two matching half-identical regions (HIRs) (see below) or full-identical regions (FIRs) which meet minimum threshold conditions can be considered identical by descent. Being identical by descent is contrasted to being identical by state (IBS).
Identity by descent can be considered on various timescales. In population genetics theory all individuals have common ancestry in the distant past. For the purposes of genetic genealogy the focus is on detecting IBD segments within a genealogical timeframe (effectively within the last ten generations) where there is a possibility of identifying the common ancestor through documentary records. Any given pair of individuals is related through many common ancestors, though many of these relationships will be too distant to result in detectable IBD segments. If the two individuals have ancestors from the same geographical region they might have many recent common ancestors, but many of the relationships will not result in IBD sharing, and there might only be one or two detectable segments inherited from a subset of the common ancestors.
IBD segments can be measured in centiMorgans (a unit of genetic distance) or in megabases (a unit of physical distance). 23andMe and Family Tree DNA report segment sizes in centiMorgans. 23andMe round the segment length to the nearest tenth of a centiMorgan and round the segment start and end co-ordinates to the closest millionth base pair to reflect the uncertainty in the exact locations of the segment boundaries. AncestryDNA originally used megabases for their matching algorithms but converted to centiMorgans in about January 2014. However, they do not provide the underlying segment data.
The terms IBD and IBS are more relevant to the results of SNP microarray testing than to results of DNA sequencing, because SNP testing provides so much less information per centiMorgan of DNA. SNP test results have an additional complexity since they report on both copies of the chromosome, but the results are not phased (that is, it is unknown which nucleotide is on which copy of the chromosome). Thus if one person's SNP result is (CC), this could be at least "Half-Identical" to either (CC) or (CT) in a second person. A homozygous mismatch such as (CC) vs. (TT) would be required before one could say the results are *not* identical.
Consecutive SNP results for a short segment of DNA may be half-identical in two individuals when in actuality the DNA sequences are not identical. Therefore we must resort to the use of statistics to determine whether two half-identical (apparently matching) segments are likely to actually be identical (neglecting rare errors and mutations). A long consecutive string of half-identical SNP results (typically about 5 cM / 700 SNPs, depending on the test's error rate and other factors) is required before one can say that two matching DNA segments are probably identical by descent. Thresholds for length and number of mismatches (errors or mutations) are set by each testing company; these criteria must be met before the company will report that two individuals very likely inherited their matching segments from a common ancestor.
Thresholds for matches
For threshold details see Family Finder versus Relative Finder - Thresholds for relationship matches. AncestryDNA originally set their threshold for matches at 5 megabases. In around January 2014 they subsequently changed to using centiMorgans for measuring matches and in 2014 the threshold was changed to 5 cM, but the earlier matches were not rerun. The thresholds for other relationships at AncestryDNA are given here. As AncestryDNA have a much lower threshold than FTDNA and 23andMe users will get proportionately many more matches but a much larger percentage will be false positive matches (FPMs).
Ranges of total centimorgans of IBD segments expected, based on family relationship
- Parent/child: 3539-3748 centimorgans (cMs)
- 1st cousins: 548-1034 cMs
- 1st cousins once removed: 248-638 cMs
- 2nd cousins: 101-378 cMs
- 2nd cousins once removed: 43-191 cMs
- 3rd cousins: 43-ca 150 cMs
- 3rd cousins once removed: 11.5-99 cMs
- 4th and more distant cousins: 5-ca 50 cMs
- Parent/child: 23-29
- 1st cousins: 17-32
- 1st cousins once removed: 12-23
- 2nd cousins: 10-18
- 2nd cousins once removed: 4-12
- 3rd cousins: 2-6?
- 3rd cousins once removed: 1-4
- 4th and more distant cousins: 0-2
IBD accuracy of genetic tests and analysis methods
- SNP microarray testing (23andMe, Family Finder, AncestryDNA, Chromo2, Geno 2.0, etc.: see Autosomal DNA testing comparison chart): the accuracy depends on the number and type of extracted autosomal and X-chromosome SNPs. Generally more is better.
- Whole-genome sequencing (WGS) using next generation sequencing (NGS) technology, is not currently affordable for the genetic genealogy market, but is being used in academic studies: IBD tools are able to detect all 1st through 6th degree relationships and 55% of 9th through 11th degree relationships, a 5% to 15% increase in relationship detection compared to high-density microarray data.
- FTDNA defintion of identical by descent Family Tree DNA Learning Center
- Wheaton K. atDNA matches. Lesson 9 in the series "Beginners' Guide to Genetic Genealogy", Wheaton Surname Resources website, 2013.
- Cooper K. The ABCs of DNA - IBD vs IBS vs mIBC Kitty Cooper's DNA Genealogy blog, 30 January 2012.
- Turner A. "Satiable Curiosity: Identity Crisis: Identical by State or Identical by Descent?" Journal of Genetic Genealogy Fall 2011, Volume 7.
- Genetic genealogy and the single segment by Steve Mount. On Genetics blog 19 February 2011.
- Granka J. The DNA research and matching development life cycle. An explanation of the AncestryDNA matching process and their definitions of IBD and IBS.
- Wikipedia article on identity by descent
- Durand EY, Eriksson N, McLean CY (2014). Reducing pervasive false positive identical-by-descent segments detected by large-scale pedigree analysis. Molecular Biology and Evolution 2014 doi: 10.1093/molbev/msu151. First published online: April 30, 2014. For a layman's summary see the 23andMe blog post:
- 23andMe scientists improve methods for finding relationships 23andMe blog, 12 May 2014.
- Li H, Glusman G, Hu H (2014) Relationship estimation from whole-genome sequence data. PLoS Genetics 2014 Jan 30;10(1):e1004144. Figure 1 indicates regions which are prone to excess IBD inference from WGS data.
- Browning SR, Browning BL (2012). Identity by descent between distant relatives: detection and applications . Annual Review of Genetics 2012; 46: 617-33. A useful review article defining the terminology and summarising the different methodologies used to infer IBD segments (subscription required).
- Hill, WG & Weir, BS (2011). Variation in actual relationship as a consequence of Mendelian sampling and linkage. Genetics Research, vol 93, no. 1, pp. 47-64. (See in particular Figure 5 which shows the distribution of actual genome sharing for different degrees of pedigree relationship.)
- Visscher PM, Medland SE, Ferreira R et al (2006). Assumption-free estimation of heritability from genome-wide identity-by-descent sharing between full siblings. PLOS Genetics 2006 Mar; 2(3): e41. Epub 2006 Mar 24.
- Li et al 2014: Relationship Estimation from Whole-Genome Sequence Data, PLoS Genetics Jan 2014; 10(1): e1004144, http://dx.doi.org/10.1371%2Fjournal.pgen.1004144