Page Actions

centiMorgan

From ISOGG Wiki

This page contains changes which are not marked for translation.

Other languages:
English • ‎français

In genetic genealogy, a centiMorgan (cM) or map unit (m.u.) is a unit of recombinant frequency which is used to measure genetic distance. It is often used to imply distance along a chromosome, and takes into account how often recombination occurs in a region. A region with few cMs undergoes relatively less recombination. The number of base pairs to which it corresponds varies widely across the genome (different regions of a chromosome have different propensities towards crossover). One centiMorgan corresponds to about 1 million base pairs in humans on average. The centiMorgan is equal to a 1% chance that a marker at one genetic locus on a chromosome will be separated from a marker at a second locus due to crossing over in a single generation.

The genetic genealogy testing companies 23andMe, AncestryDNA, Family Tree DNA and MyHeritage DNA use centiMorgans to denote the size of matching DNA segments in autosomal DNA tests. Segments which share a large number of centiMorgans in common are more likely to be of significance and to indicate a common ancestor within a genealogical timeframe.

The centiMorgan was named in honor of geneticist Thomas Hunt Morgan by his student Alfred Henry Sturtevant. Note that the parent unit of the centiMorgan, the Morgan, is rarely used today.

23andMe and Family Tree DNA both use HapMap to infer their centiMorgans.

centiMorgans vs megabases

CentiMorgans are interpolated numbers that take into consideration each area of a chromosome and its propensity to recombine. This means if two cousins share 40 cM on chromosome 1, and two different cousins share 40 cM on chromosome 5, they both can be predicted to share a certain degree of relationship statistically. Megabases vary slightly in different locations so that in the same scenario, if both sets shared 40 Mb pairs, it would be more difficult to ensure they are of a similar degree of relation without further accounting for location, chromosome and other factors.[1]

Ann Turner provides a useful explanation: "I think of the cM as being a unit of 'effective' distance. As an analogy, a mile is a fixed quantity (5280 feet), and so are megabases. But the probability that a person can walk a mile in 20 minutes is more fluid. If the terrain is very rough, the "effective" distance of a literal mile might be more like two miles if you're trying to arrive at a certain time. We're more interested in the probability that a segment will be passed on intact than the size of the segment in Mb".[2]

As the cM is an empirical measure, based on recombination events in a particular dataset of parents and offspring, it can vary somewhat from study to study. This set of maps for each chromosome shows that the general shape of the centiMorgan vs megabase curve is similar for two datasets, but the absolute values are not quite the same:

http://web.archive.org/web/20070113005025/http://compgen.rutgers.edu/maps/compare.pdf

cm values per chromosome

The following table compares cM values per chromosome at Family Tree DNA, GEDmatch, and 23andMe. AncestryDNA uses 3475 as the total cM according to the help screen for confidence level in a DNA match. This presumably excludes the X chromosome.

CM chromosome FTDNA&GEDMatch&23andMe.jpg

Probability of crossover

The following chart shows the estimated probability that a segment will be affected by a crossover. The chart does not take into account some variables such as inversions and different recombination rates for males and females.

Crossover probability centiMorgans.png

Converting centiMorgans into percentages

In order to get an approximate percentage of shared DNA from a Family Tree DNA Family Finder test, take all of the segments above 5 cM, add them together and then divide by 68.

The way the calculation works is that your total genome in cMs with the Family Finder test is 6770 cM. A half-identical match (such as a parent/child) is 3385 cM. This number has to be doubled to represent both the maternal and paternal sides giving a total of 6770 cM. Matt Dexter explains: "The reason the number is not 6770 or 6800, but rather 68, is that it saves an additional step doing the math to convert an answer to percent. For example, 3385 / 6770 = .5 then as a second step, .5 times 100 = 50%. Using 68 to start with saves the added math step. So (3385 / 6800) * 100 is the same thing as 3385 / 68, which results in = 50%."[3]

Human reference genome

The centiMorgan totals per chromosome are based on the Human Reference Genome. 23andMe and Ancestry DNA use Build 37. Family Tree DNA use Build 37 for matching but Build 36 for segment boundaries in the Chromosome Browser. Raw data files are provided in both formats. Build 37 filled in quite a few gaps, and the number of base pairs in each of the chromosomes was longer in Build 37 as compared to Build 36. Consequently the cM totals per chromosome are lower for Family Finder than they are for 23andMe. GedMatch Classic used Build 36, and converted AncestryDNA and 23andMe data from Build 37 to Build 36 for backward compatibility. The new GEDmatch (formerly known as GedMatch Genesis) uses Build 37 but the one-to-one tool offers an option to display segment boundaries in Build 36, Build 37, or Build 38.

The latest version of the Human Reference Genome, Build 38, was released in December 2013. However, none of the companies have as yet adopted Build 38 and there is a “gentleman’s agreement” in place to stick with Build 37 for the present time.

Company articles

Further reading

Resources

References

  1. Matt Dexter. Megabases versus centiMorgans Message posted on the ISOGG Group Administrators' mailing list, 21 June 2014.
  2. Ann Turner. centiMorgans vs megabases. Message posted on the ISOGG Group Administrators' mailing list, 22 June 2014.
  3. Matt Dexter. Message posted on the ISOGG DNA Newbie list in a thread entitled "DNA Conference Thank You", 13 November 2013.

See also