Identical by descent

From ISOGG Wiki
Jump to: navigation, search
This page contains changes which are not marked for translation.

Other languages:English 100%

Identical by descent (IBD) is a term used in genetic genealogy to describe a matching segment of DNA shared by two or more people that has been inherited from a recent common ancestor. The segments are considered to match if all the alleles on a paternal or maternal chromosome are identical (barring rare mutations and genotyping errors). Two matching half-identical regions (HIRs) (see below) or full-identical regions (FIRs)which meet minimum threshold conditions can be considered identical by descent. Being identical by descent is contrasted to being identical by state (IBS).

IBD segments can be measured in centiMorgans (a unit of genetic distance) or in megabases (a unit of physical distance). The three major testing companies (23andMe, AncestryDNA and Family Tree DNA) all now report segment sizes in centiMorgans. 23andMe round the segment length to the nearest tenth of a centiMorgan and round the segment start and end co-ordinates to the closest millionth base pair to reflect the uncertainty in the exact locations of the segment boundaries. AncestryDNA originally used megabases for their matching algorithms but converted to centiMorgans in about January 2014. Both 23andMe and Family Tree DNA provide information on the matching segments which can be downloaded into a spreadsheet. The segment data is not currently provided by AncestryDNA.

Identity by descent can be considered on various timescales. In population genetics theory all individuals have common ancestry in the distant past. For the purposes of genetic genealogy the focus is on detecting IBD segments within a genealogical timeframe (effectively within the last ten generations) where there is a possibility of identifying the common ancestor through documentary records. Any given pair of individuals is related through many common ancestors, though many of these relationships will be too distant to result in detectable IBD segments. If the two individuals have ancestors from the same geographical region they might have many recent common ancestors, but many of the relationships will not result in IBD sharing, and there might only be one or two detectable segments inherited from a subset of the common ancestors. In a study of a European subset of the Population Reference Sample (POPRES) dataset it was estimated that for the most part IBD blocks longer than 4 cM come from 500 to 1,500 years ago, and blocks longer than 10 cM are within the last 500 years.[1]

Consecutive SNP results for a short segment of DNA may be half-identical in two individuals when in actuality the DNA sequences are not identical (IBS) because the SNPs match on opposing chromosomes. Therefore we must resort to the use of statistics to determine whether two half-identical (apparently matching) segments are likely to actually be identical (neglecting rare errors and mutations). A long consecutive string of half-identical SNP results (typically about 5 cM / 700 SNPs, depending on the test's error rate and other factors), is required before one can infer that two matching DNA segments are probably identical by descent.

The techniques of chromosome mapping, triangulation and phasing can be used to distinguish between IBD and IBS segments.

It is problematical to use small segments under 5 cM for genealogical matching purposes. Durand et al (2014) analysed phased data from 2,952 father-mother-child trios in the 23andMe dataset and identified a false positive rate of over 67% for 2–4 cM segments.[2] The error rate is likely to be much higher in unphased data. Currently AncestryDNA are the only company to phase the data before doing the matching.

The terms IBD and IBS are more relevant to the results of SNP microarray testing than to results of whole-genome sequencing, because microarray testing provides so much less information per centiMorgan of DNA. Microarray test results have an additional complexity since they report on both copies of the chromosome, but the results are not phased (that is, it is unknown which nucleotide is on which copy of the chromosome). Thus if one person's SNP result is (CC), this could be at least "Half-Identical" to either (CC) or (CT) in a second person. A homozygous mismatch such as (CC) vs. (TT) would be required before one could say the results are *not* identical.

Contents

Thresholds for matches

Thresholds for length and number of mismatches (errors or mutations) are set by each testing company; these criteria must be met before the company will report that two individuals very likely inherited their matching segments from a common ancestor. For threshold details see Family Finder versus DNA Relatives - Thresholds for relationship matches.

AncestryDNA introduced a new matching system in November 2014.[3] Detailed FAQs and a technical White Paper can be viewed by AncestryDNA testees. AncestryDNA assigns confidence levels depending on the approximate amount of shared centiMorgans:

Confidence score Approximate amount of sharing Likelihood you and your match share a recent common ancestor within 5 or 6 generations
Extremely high More than 30 centiMorgans Virtually 100%
Very high 20-30 centiMorgans 99%
High 12-20 centiMorgans 95%
Good 6-12 centiMorgans More than 50%
Moderate 6 centiMorgans or less 20-50%

Note that the AncestryDNA database is 99% American, and it is not yet known if these ranges will apply in the same way to other populations.

Previously AncestryDNA set their threshold for matches at 5 megabases. In around January 2014 they subsequently changed to using centiMorgans and the threshold was changed to 5 cM, but the earlier matches were not rerun. The previous thresholds for other relationships at AncestryDNA are given here.

Research by the genetic genealogist John Walden can also be used as a guideline, though this research has not been subjected to peer review and the methodology is unclear.[4][5]

cM  % IBD  % IBS
10 99 1
9 80 20
8 50 50
7 30 70
6 20 80
5 5 95

Ranges of total centiMorgans of IBD segments expected, based on family relationship

  • Parent/child: 3539-3748 centimorgans (cMs)
  • 1st cousins: 548-1034 cMs
  • 1st cousins once removed: 248-638 cMs
  • 2nd cousins: 101-378 cMs
  • 2nd cousins once removed: 43-191 cMs
  • 3rd cousins: 43-ca 150 cMs
  • 3rd cousins once removed: 11.5-99 cMs
  • 4th and more distant cousins: 5-ca 50 cMs

Ranges of the number of shared IBD segments based on family relationship

  • Parent/child: 23-29
  • 1st cousins: 17-32
  • 1st cousins once removed: 12-23
  • 2nd cousins: 10-18
  • 2nd cousins once removed: 4-12
  • 3rd cousins: 2-6?
  • 3rd cousins once removed: 1-4
  • 4th and more distant cousins: 0-2

IBD accuracy of genetic tests and analysis methods

  • SNP microarray testing (23andMe, Family Finder, AncestryDNA, Chromo2, Geno 2.0, etc.: see Autosomal DNA testing comparison chart): the accuracy depends on the number and type of extracted autosomal and X-chromosome SNPs. Generally more is better.
  • Whole-genome sequencing (WGS) using next generation sequencing (NGS) technology, is not currently affordable for the genetic genealogy market, but is being used in academic studies: IBD tools are able to detect all 1st through 6th degree relationships and 55% of 9th through 11th degree relationships, a 5% to 15% increase in relationship detection compared to high-density microarray data.[6]

Further reading

Blog posts

Scientific papers

References

  1. Ralph P, Coop G (2013). The Geography of Recent Genetic Ancestry across Europe. PLOS Biology 11(5):e1001555.
  2. Durand EY, Eriksson N, McLean CY. Reducing pervasive false positive identical-by-descent segmentsdetected by large-scale pedigree analysis. Molecular Biology and Evolution advance access publication online 30 April 2014.
  3. Swayne A. DNA matching just got better. Ancestry blog. 19 November 2014.
  4. John Walden's research reported in a message to the Autosomal DNA Rootsweb list by Tim Janzen, 6 January 2012.
  5. See also the files on John Walden's website.
  6. Li et al 2014: Relationship Estimation from Whole-Genome Sequence Data, PLoS Genetics Jan 2014; 10(1): e1004144, http://dx.doi.org/10.1371%2Fjournal.pgen.1004144

See also