Is is a possible mosaic / effect of downsampling?
0
0
Entering edit mode
6.1 years ago
Sharon ▴ 600

Hi Everyone

I am trying to check if this low freq variant is ineresting, if it is also mosaic.

AC=1;AF=0.5;AN=2;DP=910; 0/1:99:910:0.095:99:549,0,17114:824,86  and QD=0.57;SF=8f.

The region around it has similar coverage, some snps is low frequent AD like 2/910 or 6/906 or 0/910, this is what I see from IGV. It is in GC percentage 60% which is high GC.

What I can't understand is this black bar which seems to be a downsampling. Could this downsampling be the reason behind the low the ALT allele counts?
Hoovering over says in this interval [-] 356 reads has been removed.

My goal is to be sure whether to discard this variant or accept it as possible mosaic. I don't also if there is another meaning for the orange area rather than it is the region targeted by downsampling? Image here:

https://ibb.co/fraAyS

Thanks

exome sequencing VCF Downsampling • 1.3k views
ADD COMMENT
1
Entering edit mode

Hi Sharon, good to see you again. It's difficult to answer. Is mosaicism expected in this case? Looking at the VCF data, it looks like the A allele is actually ~10% of the reads, but I cannot see your other fields.

Yes, due to downsampling, you may not see many of the reads with the A allele. To view all of the reads and not just the downsampled ones, you can do: View --> Preferences --> Alignments and then change the maximum coverage depth

ADD REPLY
0
Entering edit mode

Thanks Kevin. What if the downsampling is biased, like I think in repeating without downsampling and see if the counts change?

ADD REPLY
1
Entering edit mode

Hi Sharon, that's an interesting view to take and I believe that you are correct (i.e., the downsampling is biased), As far as I know, when a variant caller is looking over each position to determine if a variant is present, they just take reads sequentially, and when they reach a certain number of reads, they stop looking further.

This directly relates to the finding that my colleagues and I made in a children's hospital in the UK whereby we split our BAM files into 4 different files, representing 100%, 75%, 50%, and 25% 'random' reads. In many situations, a Sanger-confirmed variant was observed in one of the lower read subsets and missed in the full (100%) set. This is possibly related to downsampling. That pipeline and methodology is on my GitHub page: https://github.com/kevinblighe/ClinicalGradeDNAseq

ADD REPLY
1
Entering edit mode

Great Kevin. Thanks so much, very much appreciated. Many thanks for also sharing this pipeline with me, will go through. Thanks :)

ADD REPLY

Login before adding your answer.

Traffic: 2002 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6