Question: Next-generation DNA sequencing: How to tell systematic errors from real SNVs?
gravatar for robles.daniela
3.6 years ago by
United Kingdom
robles.daniela60 wrote:

Hi all,

I hope you're well. I have one question regarding variant calling in NGS data.

We have sequenced a number of human germline exomes, captured with Agilent SureSelect 50mb All Exon kit, in the Illumina GAII platform, paired-end, 75bp reads. We have aligned with BWA and called variants with Samtools mpileup.

I was looking at one particular variant, located at GRCh37 7:124499165. This variant is called heterozygously with OK SNP quality (QUAL 20-41) in 6/41 exomes, and with low qualities in a further 16 exomes. Sequencing depths for the exomes where variant has been called with OK quality ranges from 61 to 139 HQ reads, however, although the reference allele (A) is supported by reads in both directions (in the majority of cases 2-4 fold more reads in the forward direction though), the alternative allele is supported, in all cases, by only reads in the reverse direction (range 17 to 45 per exome). Of course, these variants will not pass the strand-bias test, so will be filtered out during analysis.

However, this variant is present in the exac dataset with a high AF:, which makes me think that maybe it's not being dropped by exome post-filtering. I believe this variant might be a systematic mapping/calling error because 1. the reads supporting the variant are in only one direction in all cases (at least in one batch of data), suggesting a sequence-specific error, and 2. because other people seem to have observed this variant, which apparently passed all their filters, and attempted Sequenom validation but failed (Bass AJ et al. Genomic sequencing of colorectal adenocarcinomas identifies a recurrent VTI1A-TCF7L2 fusion. Nat Genet 43, 964–968 (2011)).

However, the question becomes, is there any way to tell a (possibly) systematic error of this kind from a real variant? The reason I'm interested in this question is because we are doing a new experiment in which we are sequencing over this region with Fluidigm technology, in which we get reads in only one direction so we lose the ability to do strand-bias testing.

Thank you very much,


ADD COMMENTlink modified 3.6 years ago by Christian2.7k • written 3.6 years ago by robles.daniela60

If you suspect something is a mapping/calling artifact, I suggest you try a different aligner (such as BBMap) and possibly a different variant-caller (such as GATK) to see if the variation remains.  You could even try a different reference (say, HG20).

ADD REPLYlink written 3.6 years ago by Brian Bushnell16k

Thanks very much for your suggestion Brian. I used a different caller (GATK HaplotypeCaller) and indeed it seems the variant is not being called. However I'm a bit puzzled as to why it is so common in the exac data if I suspect loads of those exomes used GATK. Maybe they used GATK UnifiedGenotyper, which does call it. I will ask over there. Thanks again! :)

ADD REPLYlink written 3.5 years ago by robles.daniela60
gravatar for Christian
3.6 years ago by
Cambridge, US
Christian2.7k wrote:
Check if the variant overlaps with any known repetitive element, segmental duplication, or CNV. If it does, it is less likely true. You can get this information for example from UCSC tracks.
ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by Christian2.7k

Thank you for your useful suggestion, Christian. I have activated all repeat tracks in the UCSC Genome Browser for this variant but none seem to overlap it... will keep looking for the solution.

ADD REPLYlink written 3.5 years ago by robles.daniela60
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1137 users visited in the last hour