Question: Variant with AD 14,3 is heterozygote?
gravatar for Sharon
2.6 years ago by
Sharon490 wrote:

Hi everyone

I would like to take a second opinion regarding a variant with DP =17. The variant is in a very high GC content area so the low coverage is reasonable.

I do hard filtering because I have few samples for VQSR, I thinks it is okay according to GATK hard filtering, the values are as follows:

SOR=2.795       GT:AD:DP:GQ:PL  0/1:14,3:17:76:76,0,506

I am worried because AD is 14,3 which means only 3 reads only support the ALT? is it safe to accept it as a het variant. The variant is very interesting based on our data, so I don't wanna discard without a strong reason.

Any comments, how do you think? with many many thanks

vcf exome • 758 views
ADD COMMENTlink modified 2.6 years ago by Gabriel R.2.8k • written 2.6 years ago by Sharon490
gravatar for Gabriel R.
2.6 years ago by
Gabriel R.2.8k
Danmarks Tekniske Universitet
Gabriel R.2.8k wrote:

I cannot provide a detailed answer given the information but a few comments.

There are 2 most likely models: 1) The site is homozygous reference

Therefore, the 3 reads are potentially a) mismapped, b) weird duplicates or c) sequencing errors.

a) Did you use a mappability filter? i recommend Heng Li's method b) Did you use rmdup? c) if both a) and b) are satisfied, then they are sequencing errors, assuming a base quality of 38, the probability of 3 seq. errors occurring independently is:

(10^((-1*38)/10))^3 = 3.981072e-12

so one chance in 250 billion.

2) The site is heterozygous reference

It is possible that this is a genuine het site and you haven't sampled the other base, this can happen with probability:

dbinom(3,17,prob=0.5) = 0.005187988

so 1 chance in ~200 (assuming no mismappings or errors).

This is reflected in your PL field where the homo ref model has a score of 76, the het model a score of 0 (most likely) and 506 for the homo alt. While the hetero is more likely that the homo ref, it is not astronomically more likely. However, the homo alt is so unlikely that it can be safely discarded.

ADD COMMENTlink written 2.6 years ago by Gabriel R.2.8k

Nice answer! I wonder if the low number of ALT reads could be because ALT reads have a mismatch with the reference. This mismatch could make some reads to map equally well to multiple locations hence their MAPQ is 0 hence they are filtered out somewhere in the analysis.

Maybe useful to inspect the unfiltered bam files to look for the reads mapped at the incriminated locus.

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by dariober11k

Thanks Gabriel for the nice answer. I did not use a nor b, I will give them a try. Just used GATK pipeline. Thanks dariober, that's another good point I did not think of, I will check other reads . Thanks

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by Sharon490

good comment. my personal experience with this is if I have to choose between filtering by mapping quality or mappability, pick mappability, it gives greater noise reduction: see figure on page 46

ADD REPLYlink written 2.6 years ago by Gabriel R.2.8k

Interesting figure. Never thought about this. So I check if the reads are mappable, if yes, that's a sign of less likely for errors, which means trust the call and trust the variant? Do I understand right?

ADD REPLYlink written 2.6 years ago by Sharon490

The reads are not mappable, the genomic region is (or isn't). Mapping quality is defined on a per read basis but mappability is defined per genomic locus. Both aim at reducing the probability that there is a CNV or repeat or a read so badly damaged due to sequencing errors that it could end up in multiple places or a read so short that could have come from multiple genomic loci.

If the genomic region is highly mappable and you have good mapping quality, it likely means that you can discard the "read is mismapped" possibility. now, it is still possible to have sequencing errors or bizarre strand effects due to Illumina even if you are correctly mapped.

ADD REPLYlink written 2.6 years ago by Gabriel R.2.8k

Great, got it now. Thanks so much for the clear explanation. Much appreciated Gabriel :)

ADD REPLYlink written 2.6 years ago by Sharon490
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1107 users visited in the last hour