Question: Why a true variant is not getting called by Haplotypecaller.
1
gravatar for AshishKS
3.0 years ago by
AshishKS10
Norway
AshishKS10 wrote:

I am using HaplotypeCaller for calling the variants for 120 gene based target sequence. For gene PMS2, there is a variant with coverage 21 (in that position) Allele fraction of the alternate allele is 5 reads ( 24%). The mapping quality of the reads are mostly 0 , I geuss because of the very similar pseudogene PMS2CL (mapping done with hg19 using BWA-MEM). This variant is not getting called by haplotypecaller, but is actually a true variant (found with sanger sequencing). I compared the BAM file with bamout file, both are similar. I also tried mapping the sequence with custom reference sequence (target region based). when I used that bam file for variant calling, It called that specific variant (though it also increased the coverage depth and increased the number of variants many fold, which are false positives).

I wonder what can be the possible explanation to this. what is the cutoff criteria, which haplotypecaller is using in this case? why the variant is not getting called at first place?

Here is the link to screenshot of a PMS2 variant with coverage 21 (also atached as file) https://drive.google.com/file/d/0Bwibh75M75p_bGJrNlpyRTVSNHVZRDMzUFB0UDFOV2gyM2Rj/view?usp=sharing Variant at PMS2 gene

ADD COMMENTlink modified 9 months ago by Biostar ♦♦ 20 • written 3.0 years ago by AshishKS10
1

I'm not seeing 21 reads in that screen shot - however, the reason is most likely due to "The mapping quality of the reads are mostly 0". These reads are probably filtered out before they are used for SNP calling because we cannot be confident about what they are telling us.

I don't know what else to say. If you want to call SNPs, you're going to need more high-quality data - or even better, more individuals known to have the same interesting genotype. Good luck! :)

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by John12k

http://gatkforums.broadinstitute.org/gatk/discussion/3131/determination-of-heterozygous-and-homozygous-calls

ADD REPLYlink written 3.0 years ago by Floris Brenk880

Im sorry, I don't understand - what did you want me to read here?

ADD REPLYlink written 3.0 years ago by John12k
4
gravatar for lh3
3.0 years ago by
lh331k
United States
lh331k wrote:

Because "the mapping quality of the reads are mostly 0". An allele balance of 24% is also very bad. You can tune parameters to call the variant anyway, but you are likely to end up with lots of false positives elsewhere. You are hitting the limits of data. You have to choose between low FN and low FP. You can hardly have both.

ADD COMMENTlink written 3.0 years ago by lh331k
1

And it also explains why everything improves when a custom reference is used. Remove the pseudogene from consideration and there is no competition for mapping. So your depth of coverage goes up, you have better Mapping Quals and perhaps better allele balance. Where it is in the context of targeted sequencing anyway this approach may be valid, however, depending on enrichment strategy Ashish you may want to confirm that off-target enrichment from the pseudogene isn't expected. If it is amplicon based what is the probability that the region from the pseudogene might also be amplified?

ADD REPLYlink written 3.0 years ago by Dan Gaston7.1k

Yes it is amplicon based, and there is aprox. 100% probability that pseudogene can also be amplified, Because, both PMS2 and PMS2CL almost similar. May be I should use a custom track, which ONLY exclude pseudogene PMS2CL from hg19 reference, As I am having trouble only with this gene.

ADD REPLYlink written 3.0 years ago by AshishKS10
2

If your primer pairs would definitely amplify both sequences then this isn't a good idea. You'll artificially be placing all reads amplified from the pseudogene on PMS2. Any conclusions you make about variants, genotypes, and frequencies at that point will be wrong.

ADD REPLYlink written 3.0 years ago by Dan Gaston7.1k

yes, I totally agree with you, custom reference (I used previously) is also doing the same, Placing the pseudogene on the PMS2.

ADD REPLYlink written 3.0 years ago by AshishKS10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2224 users visited in the last hour