Page Actions

User

ChrisR/current NextGenSeq testing

From ISOGG Wiki

< User:ChrisR

NGS comparison table

FGC = Full Genomes Corporation, FTDNA = Family Tree DNA, see also Y-DNA next generation sequencing, Y-DNA SNP testing chart and Autosomal DNA testing comparison chart.


FGC WGS 30× FGC WGS 20× FGC WGS 15× (GenomeGuide) FGC WGS 10× FGC WGS 4× 1000 Genomes Ph.3 FGC WGS 2× FGC Y-Elite 2 FGC Y-Elite 1 FTDNA BigY
Introduced Summer 2014 Late 2015 Early 2016 July 2015 July 2015 2011-2013 July 2015 May 2015 Late 2012-2015 November 2013
Price $1250
($42/×)
$1200
($60/×)
$895
($60/×)
$725
($73/×)
$395
($99/×)
- $280
($140/×)
$775 $850-1299 $575
Sequenced
DNA focus
whole genome whole genome whole genome whole genome whole genome whole genome whole genome Y-DNA,
mtDNA
Y-DNA,
mtDNA
Y-DNA
(until April 2015 mtDNA)
Read depth,
read length,
Method
30×
150 bp (or 10 mb)[1]
20×
150 bp
15×
150 bp
10×
150 bp

150 bp
min. 4×, av. 7×
> 70bp

150 bp
30×
250 bp
50×
100 bp
60×
100 bp
Upgrade options $55 per 1× + $100 data fee[1] price difference + $100 data fee[1] price difference + $100 data fee price difference + $100 data fee[2] price difference to 10× + $100 data fee[2] - price difference to 4× + $100 data fee[2] 2nd order for 60× [3] - -
Y ≥1× coverage
(FGC)
~22.9 mbp [4]
92% hg19
~22.8 mbp [5]
92% hg19
~ ? ~21.8 mbp [6]
89% hg19
~17.7 mbp [4]
72% hg19
? mbp ~13.8 mbp [4]
56% hg19
>22.0 mbp [4]
89% hg19
~22.8 mbp [4]
92% hg19
(21.5-23 mbp)[7]
~16 mbp
65% hg19
(14-23 mbp)[7]
Y Callable Loci (GATK)
(FGC qual.-read-lenght)
~14.9 mbp [8] ~13.9 mbp [5] ~13.2 mbp [9] ~8.0 mbp ~1.1 mbp ? ~0.4 mbp ~14.8 mbp [10] ~14.1 mbp [11] ~8.8 mbp [11]
Y Method
Analysis (YFull)
Mean/Av. 21×
Median 12×
~22.8 mbp
~0.3 Gb BAM
~2900? SNPs
?/111 STRs
Mean/Av. 10-11×
Median 4-5×
~88% Y-cov-hg19
~0.2 Gb BAM
~2,762 known + ? novel SNPs
~?/111 STRs
[12] Mean/Av. 9×
Median 4×
~87% Y-cov-hg19
~0.1 Gb BAM
2,764 known + 243 novel SNPs
~81/111 STRs
Mean/Av. ?×
Median ?×
? bp
~0.18 Gb BAM
~2300? SNPs
ca. 1/3 of 111 STRs
? Mean/Av. ~47-72×
Median ~31-47×
22 mbp
~1.2 Gb BAM [4]
~2750 SNPs[6]
~107/111 STRs
Mean/Av. -76×
Median 37-39×
22.7-25 mbp
~3 Gb BAM
~2800 SNPs
~98/111 STRs
Mean/Av. -91×
Median 47-60×
~13.9 mbp
~0.8 Gb BAM
~2050 SNPs
~96/111 STRs
mt Method
Analysis

~100% FMS
Mean/Av. >1000X [9]
~100% FMS
Mean/Av. >1000X [13]



92-100% FMS [14] ~95% FMS
Mean/Av. ~26X [13]
(75-100%)[7]
~69% FMS
Mean/Av. ~13-41X [13]
(0-100%)[7]
at/X Method
~3,60 mill. SNPs expected
~3.60 mill. SNPs (~100%)[15]
ca. 22.5x?
Coverage ca. 95%.
~3.52 mill. SNPs (~98%)[9] ~3.11 mill. SNPs (~86%)[15]
~1.75 mill. SNPs (~49%)[15]


not included not included not included

Numbers of variants in the human genome / in WGS databases

The human nucleotide diversity is estimated to be 0.1% to 0.4% of base pairs. A difference of 1 to 4 in 1,000 amounts to approximately 3 to 12 million nucleotide differences, because the human genome has about 3 billion nucleotides.[16]

SNVs and structural variation of some WGS databases (Francioli, Menelaou et al 2014)

Variants shared by whole genomes of 250 Dutch parent-offspring families from Genome of the Netherlands (GoNL) Project (20.4 million single-nucleotide variants and 1.2 million insertions and deletions, intermediate coverage ~13×)[17]

Variant dataset Variants M(illions) percent
HapMap CEU 2005-2009 2.3 11%
1000G EUR 2011-2013 9.1 45%
1000G 2011-2013 1.2 6%
dbSNP 1998-2013 [18] 0.2 1%
GoNL 2014 only 7.6 37%
Sum 20.4 100%

See for comparison the widely used Illumina SNP chips (23andMe, FamilyFinder, Ancestry.com, etc.) which provide a few hundred thousand SNP Markers: Autosomal DNA testing comparison chart

relationship between read depth and coverage in Next generation sequencing (Wang, Wei et al 2011)

Minimal read depth and coverage for variant (SNV/SNP) research

Sequencing depth represents the (often average) number of nucleotides contributing to a portion of an assembly. On a genome basis, it means that, on average, each base has been sequenced a certain number of times (10×, 20×,...). For a specific nucleotide, it represents the number of sequences that added information about that nucleotide. Such depth varies quite a lot depending on the genomic region. In consequence, an average sequencing depth of 30× leaves a lot of small portions of a genome unsequenced while other receive a lot more sequences.[19]

Low confidence: 7×

The 1000 Genomes Project sequenced genomes of 2,504 individuals representing 26 populations to an average of 7× coverage. This dataset is used by many for variant research and has acceptable minimal confidence for haploid genome parts (mtDNA and hemizygous Y-DNA).

accuracy variant calling various coverage depths (filtered on chr20, Cheng, Teo et al 2014)

Medium confidence: 10× - 59×

Everything >7× is called Deep sequencing. For detecting human genome mutations, SNPs, and rearrangements, publications often recommend from 10× to 30× depth of coverage, depending on the application and statistical model.[20] A 2011 study calls 10× SNP calling capability enough for the standard SNP analysis evaluation.[21]

Analysis of the first sequenced human genome in 2008 suggests that homozygous SNVs are detected at a 15× average depth and an average depth of 33× is required to detect the same proportion of heterozygous SNVs.[22] A 2011 study suggests improvements in sequencing set the required average mapped depth to 35× for reliable calling of SNVs and small indels across 95% of the genome.[23]

SNP call accuracy according to a 2014 study on single nucleotide variant detection and genotype calling (for chr20)[24]

  • 5×: 90-97%
  • 10×: 96-98%
  • 15×: 98% (Minimum for rare variants)
  • ≥20×: 99%
Minimum depth for correct genotype call (Meynert, Ansari et al 2014)

High confidence: 60× and higher

  • For high confidence of Exome variants (medical)
  • "We calculated that 60× WGS data from the HiSeq 2000 platform are needed to recover ~95% of INDELs, much higher than that for SNP detection. Accurate detection of heterozygous INDELs requires ~1.2-fold higher coverage than that for homozygous INDELs" [25]

Interesting features for population genetics and genetic genealogy

Enrichment / target designs can help to provide better coverage for certain genome areas. Ability to deliver the following data seems crucial for competitivity in the market:

Y

  • Detection derived Y-SNPs > 2000
  • Y-STR coverage especially for FTDNA Y37-Y67 panel
  • FASTQ files for remapping possibility

Autosomal/X

  • Coverage for the main DTC-chip SNPs useful for admixture and IBD comparisons like on Gedmatch: 23andMe (v1-v4), Ancestry.com, FTDNA FamilyFinder, Geno 2.0 (v1-v2), Chromo2;
  • Phasing: possibility to distinguish paternal and maternal DNA in an individual without having parents DNA/testing. At 34× phasing of 96 % of SNPs into haplotype blocks should be already possible.[26]
  • Potential for coverage of highly informative continental and regional SNPs (rare variants) to be used in future admixture and matching services (distant genetic relations).[27][28]

mt

  • ~100% FMS with good mean read depth (>50×)

References

  1. 1.0 1.1 1.2 $2750 pilot project long read whole genome Chromium technology, Justin Loe, FGC, 2016-09-02, Forum http://www.anthrogenica.com/showthread.php?742-Full-Y-Chromosome-Sequencing-Phase-III-Pilot&p=184229&viewfull=1#post184229 Cite error: Invalid <ref> tag; name "LoeUpgrade1512" defined multiple times with different content Cite error: Invalid <ref> tag; name "LoeUpgrade1512" defined multiple times with different content
  2. 2.0 2.1 2.2 Justin Loe, FGC, 2015-11-29, Forum http://www.anthrogenica.com/showthread.php?742-Full-Y-Chromosome-Sequencing-Phase-III-Pilot&p=123473&viewfull=1#post123473
  3. Justin Loe, FGC, 2015-12-15, http://www.anthrogenica.com/showthread.php?742-Full-Y-Chromosome-Sequencing-Phase-III-Pilot&p=126856#post126856
  4. 4.0 4.1 4.2 4.3 4.4 4.5 Justin Loe, email message, 28 Dec 2015 and AG Forum 2016-01 http://www.anthrogenica.com/showthread.php?742-Full-Y-Chromosome-Sequencing-Phase-III-Pilot&p=131470&viewfull=1#post131470
  5. 5.0 5.1 Justin Loe, AG Forum 2016-01 http://www.anthrogenica.com/showthread.php?742-Full-Y-Chromosome-Sequencing-Phase-III-Pilot&p=135412&viewfull=1#post135412
  6. 6.0 6.1 Justin Loe, AG Forum 2016-01 http://www.anthrogenica.com/showthread.php?742-Full-Y-Chromosome-Sequencing-Phase-III-Pilot&p=132860&viewfull=1#post132860
  7. 7.0 7.1 7.2 7.3 Jim Kane, Which Y-DNA NGS test to take? November 12, 2015, http://www.it2kane.org/2015/11/which-ngs-test-to-take/
  8. Justin Loe, E-Mail 2016-06-10
  9. 9.0 9.1 9.2 Justin Loe, E-Mail 2016-02
  10. Justin Loe, AG Forum 2016-03 http://www.anthrogenica.com/showthread.php?742-Full-Y-Chromosome-Sequencing-Phase-III-Pilot&p=146390&viewfull=1#post146390
  11. 11.0 11.1 Vince Tilroe analysis of FGC raw-data from Greg Magoon, shared by Iain McDonald, 2 Dec 2015
  12. based on a single sample with initial QC problems YF05650
  13. 13.0 13.1 13.2 Petr, Forum post: Full Y Chromosome Sequencing: Phase III Pilot, 2015-12-25, http://www.anthrogenica.com/showthread.php?742-Full-Y-Chromosome-Sequencing-Phase-III-Pilot&p=128756&viewfull=1#post128756
  14. Justin Loe, Batch 9006, email message, 28 Dec 2015
  15. 15.0 15.1 15.2 Justin Loe, AG Forum 2016-01 http://www.anthrogenica.com/showthread.php?742-Full-Y-Chromosome-Sequencing-Phase-III-Pilot&p=134491&viewfull=1#post134491
  16. Human genetic variation / Measures of variation, Wikipedia, 2016-01 https://en.wikipedia.org/wiki/Human_genetic_variation
  17. Francioli, Menelaou et al 2014: doi:10.1038/ng.3021, http://www.nature.com/ng/journal/v46/n8/abs/ng.3021.html
  18. ca. NCBI dbSNP Build 138, Apr 2013: http://www.ncbi.nlm.nih.gov/SNP/snp_summary.cgi?view+summary=view+summary&build_id=138
  19. Eric Normandeau, What Is The Sequencing 'Depth' ?, 2011-2012, https://www.biostars.org/p/638/#640
  20. Illumina, Sequencing Coverage, 2016-01, http://www.illumina.com/science/education/sequencing-coverage.html
  21. Wang, Wei et al 2011, Next generation sequencing has lower sequence coverage and poorer SNP-detection capability in the regulatory regions, http://dx.doi.org/10.1038/srep00055
  22. Bentley et al. 2008: Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456 , 53–59.
  23. Ajayet al 2011: Accurate and comprehensive sequencing of personal genomes. Genome Res. 21, 1498–1505.
  24. Cheng, Teo et al 2014: Assessing single nucleotide variant detection and genotype calling on whole-genome sequenced individuals. Bioinformatics Volume 30 Issue 12.. Interpretation by FGC (JL)
  25. Fang, Narzisi et al 2014: Reducing INDEL errors in whole-genome and exome sequencing, http://dx.doi.org/10.1186/s13073-014-0089-z
  26. 10X Genomics GemCode platform, 2015-08-16, Forum http://www.anthrogenica.com/showthread.php?5178-WGS-tec-able-to-phase-96-of-SNPs-into-haplotype-blocks
  27. Al-Khudhair, Qiu et al 2015: Inference Of Distant Genetic Relations In Humans Using “1000 Genomes” http://dx.doi.org/10.1093/gbe/evv003
  28. Schiffels, Haak et al 2016: rarecoal in "Iron Age and Anglo-Saxon genomes from East England reveal British migration history" http://dx.doi.org/10.1038/ncomms10408