ChrisR/current NextGenSeq testing
From ISOGG Wiki
< User:ChrisRContents
NGS comparison table
FGC = Full Genomes Corporation, FTDNA = Family Tree DNA, see also Y-DNA next generation sequencing, Y-DNA SNP testing chart and Autosomal DNA testing comparison chart.
FGC WGS 30× | FGC WGS 20× | FGC WGS 15× (GenomeGuide) | FGC WGS 10× | FGC WGS 4× | 1000 Genomes Ph.3 | FGC WGS 2× | FGC Y-Elite 2 | FGC Y-Elite 1 | FTDNA BigY | |
---|---|---|---|---|---|---|---|---|---|---|
Introduced | Summer 2014 | Late 2015 | Early 2016 | July 2015 | July 2015 | 2011-2013 | July 2015 | May 2015 | Late 2012-2015 | November 2013 |
Price | $1250 ($42/×) |
$1200 ($60/×) |
$895 ($60/×) |
$725 ($73/×) |
$395 ($99/×) |
- | $280 ($140/×) |
$775 | $850-1299 | $575 |
Sequenced DNA focus |
whole genome | whole genome | whole genome | whole genome | whole genome | whole genome | whole genome | Y-DNA, mtDNA |
Y-DNA, mtDNA |
Y-DNA (until April 2015 mtDNA) |
Read depth, read length, Method |
30× 150 bp (or 10 mb)[1] |
20× 150 bp |
15× 150 bp |
10× 150 bp |
4× 150 bp |
min. 4×, av. 7× > 70bp |
2× 150 bp |
30× 250 bp |
50× 100 bp |
60× 100 bp |
Upgrade options | $55 per 1× + $100 data fee[1] | price difference + $100 data fee[1] | price difference + $100 data fee | price difference + $100 data fee[2] | price difference to 10× + $100 data fee[2] | - | price difference to 4× + $100 data fee[2] | 2nd order for 60× [3] | - | - |
Y ≥1× coverage (FGC) |
~22.9 mbp [4] 92% hg19 |
~22.8 mbp [5] 92% hg19 |
~ ? | ~21.8 mbp [6] 89% hg19 |
~17.7 mbp [4] 72% hg19 |
? mbp | ~13.8 mbp [4] 56% hg19 |
>22.0 mbp [4] 89% hg19 |
~22.8 mbp [4] 92% hg19 (21.5-23 mbp)[7] |
~16 mbp 65% hg19 (14-23 mbp)[7] |
Y Callable Loci (GATK) (FGC qual.-read-lenght) |
~14.9 mbp [8] | ~13.9 mbp [5] | ~13.2 mbp [9] | ~8.0 mbp | ~1.1 mbp | ? | ~0.4 mbp | ~14.8 mbp [10] | ~14.1 mbp [11] | ~8.8 mbp [11] |
Y Method Analysis (YFull) |
Mean/Av. 21× Median 12× ~22.8 mbp ~0.3 Gb BAM ~2900? SNPs ?/111 STRs |
Mean/Av. 10-11× Median 4-5× ~88% Y-cov-hg19 ~0.2 Gb BAM ~2,762 known + ? novel SNPs ~?/111 STRs |
[12] Mean/Av. 9× Median 4× ~87% Y-cov-hg19 ~0.1 Gb BAM 2,764 known + 243 novel SNPs ~81/111 STRs |
Mean/Av. ?× Median ?× ? bp ~0.18 Gb BAM ~2300? SNPs ca. 1/3 of 111 STRs |
? | Mean/Av. ~47-72× Median ~31-47× 22 mbp ~1.2 Gb BAM [4] ~2750 SNPs[6] ~107/111 STRs |
Mean/Av. -76× Median 37-39× 22.7-25 mbp ~3 Gb BAM ~2800 SNPs ~98/111 STRs |
Mean/Av. -91× Median 47-60× ~13.9 mbp ~0.8 Gb BAM ~2050 SNPs ~96/111 STRs | ||
mt Method Analysis |
~100% FMS Mean/Av. >1000X [9] |
~100% FMS Mean/Av. >1000X [13] |
92-100% FMS [14] | ~95% FMS Mean/Av. ~26X [13] (75-100%)[7] |
~69% FMS Mean/Av. ~13-41X [13] (0-100%)[7] | |||||
at/X Method ~3,60 mill. SNPs expected |
~3.60 mill. SNPs (~100%)[15] ca. 22.5x? Coverage ca. 95%. |
~3.52 mill. SNPs (~98%)[9] | ~3.11 mill. SNPs (~86%)[15] |
~1.75 mill. SNPs (~49%)[15] |
not included | not included | not included |
Numbers of variants in the human genome / in WGS databases
The human nucleotide diversity is estimated to be 0.1% to 0.4% of base pairs. A difference of 1 to 4 in 1,000 amounts to approximately 3 to 12 million nucleotide differences, because the human genome has about 3 billion nucleotides.[16]
Variants shared by whole genomes of 250 Dutch parent-offspring families from Genome of the Netherlands (GoNL) Project (20.4 million single-nucleotide variants and 1.2 million insertions and deletions, intermediate coverage ~13×)[17]
Variant dataset | Variants M(illions) | percent |
HapMap CEU 2005-2009 | 2.3 | 11% |
1000G EUR 2011-2013 | 9.1 | 45% |
1000G 2011-2013 | 1.2 | 6% |
dbSNP 1998-2013 [18] | 0.2 | 1% |
GoNL 2014 only | 7.6 | 37% |
Sum | 20.4 | 100% |
See for comparison the widely used Illumina SNP chips (23andMe, FamilyFinder, Ancestry.com, etc.) which provide a few hundred thousand SNP Markers: Autosomal DNA testing comparison chart
Minimal read depth and coverage for variant (SNV/SNP) research
Sequencing depth represents the (often average) number of nucleotides contributing to a portion of an assembly. On a genome basis, it means that, on average, each base has been sequenced a certain number of times (10×, 20×,...). For a specific nucleotide, it represents the number of sequences that added information about that nucleotide. Such depth varies quite a lot depending on the genomic region. In consequence, an average sequencing depth of 30× leaves a lot of small portions of a genome unsequenced while other receive a lot more sequences.[19]
Low confidence: 7×
The 1000 Genomes Project sequenced genomes of 2,504 individuals representing 26 populations to an average of 7× coverage. This dataset is used by many for variant research and has acceptable minimal confidence for haploid genome parts (mtDNA and hemizygous Y-DNA).
Medium confidence: 10× - 59×
Everything >7× is called Deep sequencing. For detecting human genome mutations, SNPs, and rearrangements, publications often recommend from 10× to 30× depth of coverage, depending on the application and statistical model.[20] A 2011 study calls 10× SNP calling capability enough for the standard SNP analysis evaluation.[21]
Analysis of the first sequenced human genome in 2008 suggests that homozygous SNVs are detected at a 15× average depth and an average depth of 33× is required to detect the same proportion of heterozygous SNVs.[22] A 2011 study suggests improvements in sequencing set the required average mapped depth to 35× for reliable calling of SNVs and small indels across 95% of the genome.[23]
SNP call accuracy according to a 2014 study on single nucleotide variant detection and genotype calling (for chr20)[24]
- 5×: 90-97%
- 10×: 96-98%
- 15×: 98% (Minimum for rare variants)
- ≥20×: 99%
High confidence: 60× and higher
- For high confidence of Exome variants (medical)
- "We calculated that 60× WGS data from the HiSeq 2000 platform are needed to recover ~95% of INDELs, much higher than that for SNP detection. Accurate detection of heterozygous INDELs requires ~1.2-fold higher coverage than that for homozygous INDELs" [25]
Interesting features for population genetics and genetic genealogy
Enrichment / target designs can help to provide better coverage for certain genome areas. Ability to deliver the following data seems crucial for competitivity in the market:
Y
- Detection derived Y-SNPs > 2000
- Y-STR coverage especially for FTDNA Y37-Y67 panel
- FASTQ files for remapping possibility
Autosomal/X
- Coverage for the main DTC-chip SNPs useful for admixture and IBD comparisons like on Gedmatch: 23andMe (v1-v4), Ancestry.com, FTDNA FamilyFinder, Geno 2.0 (v1-v2), Chromo2;
- Phasing: possibility to distinguish paternal and maternal DNA in an individual without having parents DNA/testing. At 34× phasing of 96 % of SNPs into haplotype blocks should be already possible.[26]
- Potential for coverage of highly informative continental and regional SNPs (rare variants) to be used in future admixture and matching services (distant genetic relations).[27][28]
mt
- ~100% FMS with good mean read depth (>50×)
References
- ↑ 1.0 1.1 1.2 $2750 pilot project long read whole genome Chromium technology, Justin Loe, FGC, 2016-09-02, Forum http://www.anthrogenica.com/showthread.php?742-Full-Y-Chromosome-Sequencing-Phase-III-Pilot&p=184229&viewfull=1#post184229 Cite error: Invalid
<ref>
tag; name "LoeUpgrade1512" defined multiple times with different content Cite error: Invalid<ref>
tag; name "LoeUpgrade1512" defined multiple times with different content - ↑ 2.0 2.1 2.2 Justin Loe, FGC, 2015-11-29, Forum http://www.anthrogenica.com/showthread.php?742-Full-Y-Chromosome-Sequencing-Phase-III-Pilot&p=123473&viewfull=1#post123473
- ↑ Justin Loe, FGC, 2015-12-15, http://www.anthrogenica.com/showthread.php?742-Full-Y-Chromosome-Sequencing-Phase-III-Pilot&p=126856#post126856
- ↑ 4.0 4.1 4.2 4.3 4.4 4.5 Justin Loe, email message, 28 Dec 2015 and AG Forum 2016-01 http://www.anthrogenica.com/showthread.php?742-Full-Y-Chromosome-Sequencing-Phase-III-Pilot&p=131470&viewfull=1#post131470
- ↑ 5.0 5.1 Justin Loe, AG Forum 2016-01 http://www.anthrogenica.com/showthread.php?742-Full-Y-Chromosome-Sequencing-Phase-III-Pilot&p=135412&viewfull=1#post135412
- ↑ 6.0 6.1 Justin Loe, AG Forum 2016-01 http://www.anthrogenica.com/showthread.php?742-Full-Y-Chromosome-Sequencing-Phase-III-Pilot&p=132860&viewfull=1#post132860
- ↑ 7.0 7.1 7.2 7.3 Jim Kane, Which Y-DNA NGS test to take? November 12, 2015, http://www.it2kane.org/2015/11/which-ngs-test-to-take/
- ↑ Justin Loe, E-Mail 2016-06-10
- ↑ 9.0 9.1 9.2 Justin Loe, E-Mail 2016-02
- ↑ Justin Loe, AG Forum 2016-03 http://www.anthrogenica.com/showthread.php?742-Full-Y-Chromosome-Sequencing-Phase-III-Pilot&p=146390&viewfull=1#post146390
- ↑ 11.0 11.1 Vince Tilroe analysis of FGC raw-data from Greg Magoon, shared by Iain McDonald, 2 Dec 2015
- ↑ based on a single sample with initial QC problems YF05650
- ↑ 13.0 13.1 13.2 Petr, Forum post: Full Y Chromosome Sequencing: Phase III Pilot, 2015-12-25, http://www.anthrogenica.com/showthread.php?742-Full-Y-Chromosome-Sequencing-Phase-III-Pilot&p=128756&viewfull=1#post128756
- ↑ Justin Loe, Batch 9006, email message, 28 Dec 2015
- ↑ 15.0 15.1 15.2 Justin Loe, AG Forum 2016-01 http://www.anthrogenica.com/showthread.php?742-Full-Y-Chromosome-Sequencing-Phase-III-Pilot&p=134491&viewfull=1#post134491
- ↑ Human genetic variation / Measures of variation, Wikipedia, 2016-01 https://en.wikipedia.org/wiki/Human_genetic_variation
- ↑ Francioli, Menelaou et al 2014: doi:10.1038/ng.3021, http://www.nature.com/ng/journal/v46/n8/abs/ng.3021.html
- ↑ ca. NCBI dbSNP Build 138, Apr 2013: http://www.ncbi.nlm.nih.gov/SNP/snp_summary.cgi?view+summary=view+summary&build_id=138
- ↑ Eric Normandeau, What Is The Sequencing 'Depth' ?, 2011-2012, https://www.biostars.org/p/638/#640
- ↑ Illumina, Sequencing Coverage, 2016-01, http://www.illumina.com/science/education/sequencing-coverage.html
- ↑ Wang, Wei et al 2011, Next generation sequencing has lower sequence coverage and poorer SNP-detection capability in the regulatory regions, http://dx.doi.org/10.1038/srep00055
- ↑ Bentley et al. 2008: Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456 , 53–59.
- ↑ Ajayet al 2011: Accurate and comprehensive sequencing of personal genomes. Genome Res. 21, 1498–1505.
- ↑ Cheng, Teo et al 2014: Assessing single nucleotide variant detection and genotype calling on whole-genome sequenced individuals. Bioinformatics Volume 30 Issue 12.. Interpretation by FGC (JL)
- ↑ Fang, Narzisi et al 2014: Reducing INDEL errors in whole-genome and exome sequencing, http://dx.doi.org/10.1186/s13073-014-0089-z
- ↑ 10X Genomics GemCode platform, 2015-08-16, Forum http://www.anthrogenica.com/showthread.php?5178-WGS-tec-able-to-phase-96-of-SNPs-into-haplotype-blocks
- ↑ Al-Khudhair, Qiu et al 2015: Inference Of Distant Genetic Relations In Humans Using “1000 Genomes” http://dx.doi.org/10.1093/gbe/evv003
- ↑ Schiffels, Haak et al 2016: rarecoal in "Iron Age and Anglo-Saxon genomes from East England reveal British migration history" http://dx.doi.org/10.1038/ncomms10408