Listing Criteria for SNP Inclusion
into the ISOGG Y-DNA Haplogroup Tree - 2016

The entire work is identified by the Version Number and date given on the Main Page.   Directions for citing the document are given at the bottom of the Main Page.
Version History     Last revision date for this specific page: 13 January 2016

LINKS:  Main Page   Y-DNA Tree Trunk   SNP Index   Papers/Presentations Cited   Glossary  

Introduction

These recommendations are to assure that there is a uniform set of criteria for accepting new mutations for inclusion on the ISOGG Y-DNA haplogroup tree.

Because of the abundance of alternatives now available, only single nucleotide polymorphisms (SNPs) are being accepted, and not insertions or deletions (indels) for new additions. In exceptional cases other variants may be considered for inclusion on a case by case basis if they can be clearly demonstrated to have equivalent properties to SNPs, but the burden of proof required will be much higher and at the discretion of the committee.

The quality guidelines for chromosome positions and reads at the bottom of this page are in addition to other requirements.

Special Coding for Interpreting SNP status


General Requirements for SNP Validation

The requirements listed here in this General Requirements section apply to validating SNPs discussed in Requirements of Specific Type of Testing in the next section below.

  1. Inserting a SNP Creating a Non-Terminal Branch to the ISOGG Tree
    The supporting information provided by the proposer should demonstrate that the new SNP is downstream of an established tree mutation. There is need also to show that the SNP was tested in individuals from all parallel subgroups on the tree.. In cases where relevant existing tree subgroups are from rare populations and based solely on old research listing only one sample proving the existence of the SNP, an exception may be granted for testing of the old subgroup. The mutations of the existing subgroup will then be listed temporarily as position undetermined.

       Example: Suppose that a new subgroup is being added with name of Q18.
          Fictional example:
             G-L140
                G-L13
                G-L1266
                G-Q18
                   G-L1268

    Then the evidence for Q18 must show that a man is derived for both Q18 and L140. Simultaneously one man each from L1266 and L13 must be ancestral for Q18. In addition, one man derived for Q18 must be derived for L1268, and a second Q18 man ancestral for L1268. Derived means the mutation is present; ancestral means it is absent.
     
  2. Adding a SNP Representing a New Terminal Branch to the ISOGG Tree
    In the case where the new SNP is the terminal branch of an existing branch then:
    1. at least one individual who has the new SNP is found also to have a SNP defining the immediate upstream subgroup.
    2. at least one individual from any parallel subgroup to the new subgroup is found also to lack the submitted SNP.

       Example: Suppose that a new subgroup is being added with name QQ12.
          Fictional example:
             G-L5432
                G-P343
                G-QQ12

    Then the evidence for QQ12 must show that two men are derived for QQ12. Simultaneously one man from P343 must be ancestral for QQ12. Also, one of the QQ12 men must be derived for L5432.

Requirements for Specific Type of Testing

Reference giving details about Y-DNA SNP testing companies:
   Y-DNA SNP testing chart
   YSEQ
Reference giving details about Y-DNA STR testing companies:
   Y-DNA STR testing chart
  1. Sanger Sequencing
    Examples of Sanger sequencing are the tests at the company ySeq and the Advanced Tests (SNP) at Family Tree DNA. STR testing is available, for instance, at Genebase and Family Tree DNA. Acceptable testing for this category consists of Sanger sequencing which targets a short segment of Y-DNA.

    The objective of the ISOGG Tree at this time is to include all SNPs that arose prior to about the year 1500 C.E. This guideline may be measured through STR diversity or alternative evidence.

    Where a new terminal subgroup is being added, STR marker results or other evidence described below for two men with the new SNP are needed.

    STR Diversity
    To be accepted the SNP must be observed in at least two individuals and must meet the STR diversity requirement. A SNP that does not meet this requirement will be classified as a Private SNP (see definition above).

    The STR diversity requirement is met if the following conditions are satisfied:
    1. If the SNP is a Non-Terminal Branch SNP, no further proof of diversity is required.
    2. Genetic distance is calculated using the Infinite Alleles Model (IAM). A marker for which there is a null value in one sample must be discarded from the calculations. Otherwise, most laboratories use the IAM.
    3. All markers tested by both individuals must be compared.
    4. If 74 markers (or fewer) are compared, the minimum genetic distance to meet the diversity requirement is 5.
    5. If 75 (or more) markers are compared, the diversity requirement is a minimum of 7%, computed by dividing the genetic distance by the number of markers compared, and rounding to the nearest integer value.

    Alternative Evidence
    If the submitter can otherwise provide evidence that the common ancestor of the two samples can be reasonably expected to have lived more than 500 years ago, this will also be considered.
     
  2. Next Generation Sequencing
    Next generation sequencing is available for the genealogical community at Full Genomes Corporation, Family Tree's Big Y Test. Next generation sequencing has the largest coverage of any type of SNP testing currently available.

    1. The committee recognizes there are a wide variety of ways in which sequencing information is available. Because of this, no specific criteria for sequencing information is provided here except the new, tentative quality guidelines in the next section. The goal of the reviewers of the sequencing submissions – at one extreme – will be to easily accept quality SNPs from old, root branches found in many samples within all the downstream branches. At the opposite extreme, it is unlikely reviewers will accept SNPs near or in terminal branches whose positions depend on the results from one sample.
    2. The submitter can use raw data report(s) pertaining to the sequencing when they provide the needed information. Just two examples of raw data reports would include a vcf file showing the usual quality scores, DP scores for depth of reads, etc. for the involved sample and pertinent additional ones, including ones from other haplogroups OR instead the so-called “haplogroup compare report” from Full Genomes Corp. Results from Sanger sequencing or from microarray products, such as Geno 2.0 or Chromo 2.0, might be acceptable comparative information in certain cases. Having a large number of pertinent comparative samples on a vcf report, can improve the scoring information.
    3. The reviewer will have to take into consideration the coverage of the next generation sequencing, varied quality scorings, position of the site on the chromosome, the percentage of samples with clean reads at the site in question, possible indel relationships to the SNP, geographical separation of the samples, non-next generation sequencing testing, results for the SNP site in other reports, and other factors in making a complex judgment as to whether the submitted SNP is almost certain to show the same results in next generation sequencing of new comparable samples.
    4. More precise criteria for next generation sequencing submissions may be provided as evidence accumulates. Addendum: these are now included in the tentative quality guidelines below.
    5. When a new SNP creating a new terminal branch is being added to the tree, at least two of the submitted samples must each have an average of 3 unique (singleton) SNPs per 10 million base pairs of sequencing coverage. Reviewers will determine uniqueness according to comparisons to all available sequencing results rather than samples tested at a particular laboratory.
    6. If the evidence for the SNP is based solely on next generation sequencing, the SNP will appear in italics on the tree.

  3. Microarray Chip-based Genotyping
    Examples of microarray chip-based genotyping are Geno 2.0or Geno 2.0 Next-Generation test, 23andMe, Chromo 2.0 and Family Tree DNA's Deep Clade panels. Microarray chips target a selected group of snps.

    1. Novel SNPs found in microarray products without a presence also in other qualifying sources - such as Sanger sequencing or next generation sequencing - cannot be submitted. However, chip-based genotyping results can be used in combination with Sanger sequencing and/or next generation sequencing results as validating evidence for one of the samples. If chip-based genotyping is part of the evidence, the approved SNP will be listed in regular type, rather than italics, even if the other evidence is from next generation sequencing.
    2. Samples from chip-based genotyping used to prove a new terminal branch must meet the criteria for STR diversity described in the Sanger sequencing section.

Tentative Quality Guidelines

Recognizing that some guidelines are needed, these are presented here tentatively. These are approximations of the border between reliable Y chromosome sites or reads and those unreliable or inconclusive. The guidelines are described as tentative because they are not based on scientific studies but rather on imprecise approximations from experience working with results. These guidelines will be amended as better information is developed. All guidelines must be met.

For situations where the mutation or mutations being submitted to the ISOGG tree based only on next-generation sequencing the mutation site and its results must meet the following criteria pertaining to the findings in the individual or individuals who have the mutation in each sample:
1. The total number of reads for that site in a sample must be at least four.

2. The percentage of reads showing the mutation must be 100% for less than 21 reads. The number allowed for 21-40 reads is one divergent read, and 2 divergent reads allowed for 41-50 reads and at least 95% for more than 50. Any reads with a mapping quality score less than 10 can be ignored in meeting the criteria of this paragraph.

3. The total number of reads cannot exceed four times the coverage of the testing. For example, for 50x coverage, the total number of reads at the site cannot exceed 200. If the laboratory is providing different coverage than advertised, the total number of reads should be adjusted accordingly.

4. The mutation site can already be listed on the ISOGG tree only in three or less locations.

5. The mapping quality for the site must average at least 10. The percentage of reads with mapping quality less than 10 at the site must not exceed 10% of the total reads.

6. When 500 adjacent base pairs are viewed with the mutation site in the center, the same sequence cannot appear at another chromosome site where 95.5% or more of the base pairs are in the same sequence. This applies only to those displayed comparisons where the number of base pairs compared are 500 or almost 500 in number, and not for smaller numbers.

7. No additional called mutations for that individual within 20 base pairs of the submitted mutation site.

8. The mutation site must not be part of a series of repeated alleles. And if part of a segment where the same allele is repeated, this segment must not exceed 6 alleles of the same type.

If Sanger sequencing results show next-generation sequencing information to be incorrect, an item submitted under this section is to be removed from the tree.

If all criteria under this section are met except for minimum number of reads in one of the samples, the mutation may be added to the tree, but ^ is to follow the name of the mutation and this will be defined on the page as not yet having minimum number of reads.

For situations where the mutation or mutations being submitted to the ISOGG based on Sanger sequencing or qualifying microarray testing the mutation site and its results must meet the following criteria:
1. The mutation site can already be listed on the ISOGG tree only in three or less locations.

2. When the site in BAM files is viewed in samples with different coverage and from different labs, the mapping quality for the site must average at least 10. The percentage of reads with mapping quality less than 10 at the site must not exceed 10% of the total reads. An exception to using samples with different coverage and from different labs, would be using the next-generation sequencing BAM file for the same individual who had the Sanger sequencing or microarray testing of this site. In this latter case, it is preferred that just the individual’s BAM results meet the criteria of this paragraph.

3. When 500 adjacent base pairs are viewed with the mutation site in the center, the same sequence cannot appear at another chromosome site where 95.5% or more of the base pairs are in the same sequence. This applies only to those displayed comparisons where the number compared are 500 or almost 500 in number, and not for smaller numbers.

4. The mutation site must not be part of a series of repeated alleles. And if part of a segment where the same allele is repeated, this segment must not exceed 6 alleles of the same type.

The supplement of the 2015 Y-DNA study by Karmin et. al. provides a listing of the highest quality Y-DNA chromosome sites, but there are acceptable additional sites not included. Karmin, M. el al. "A recent bottleneck of Y chromosome diversity coincides with a global change in culture." Genome Research, 25: 1-8, 2015.

Submissions of haplogroup A0 and A00 items must be handled on a case-by case basis because these haplogroups were not known when the Y reference samples were chosen.

Where a shared variant fails to meet the quality guidelines above, it can still be added to the tree followed by two ^^ symbols and the following note in exceptional circumstances where evidence suggests it occurs in at least 5 persons, is stable, does not occur elsewhere in the tree and serves an important purpose:
^^does not meet quality guidelines but may be a helpful identifier.

Resources pertaining to the guidelines:
At YBrowse YBrowse one can use the chromosome number to search. Then in Scroll/Zoom there, choose 500bp and hit the Go button. This produces the 500 base pairs. Copy these and paste into BLAT. The default settings of Human, Feb 2009, BLAT's guess, query/score and hyperlink are retained. Then Submit. The list of possible duplications or near duplications appears.

At the Broad Institute site one can download free software for the IGV reader for BAM files IGV Software Several small files are needed for use and can be provided upon request to Ray Banks.

Reading BAM files requires an index file which ends in .bai. If this is not available, BAM Indexer the free BamView software will index the file.

BAM files are large and may require unzipping depending on the source.

Acceptance Process for Placing a SNP on the ISOGG Y-DNA Haplotree

The discoverer of the SNP (or a knowledgeable third party) can email the Contact Person listed on the appropriate haplogroup page and describe where the new SNP fits in the tree. The haplogroup experts will evaluate the evidence for inclusion on the tree. If the information on tree placement is insufficient, it will be listed as investigational in the section under the tree. If the Contact Person is not available, contact Ray Banks.

Corrections/Additions made since 1 January 2016:

Back to Main Page
Back to Y-DNA Tree Trunk
Back to SNP Index
Back to Papers/Presentations Cited
Back to Glossary

Copyright 2016. International Society of Genetic Genealogy. All Rights Reserved.

   
ISOGG logo