Question: Is is a possible mosaic / effect of downsampling?
gravatar for Sharon
15 days ago by
Sharon190 wrote:

Hi Everyone

I am trying to check if this low freq variant is ineresting, if it is also mosaic.

AC=1;AF=0.5;AN=2;DP=910; 0/1:99:910:0.095:99:549,0,17114:824,86  and QD=0.57;SF=8f.

The region around it has similar coverage, some snps is low frequent AD like 2/910 or 6/906 or 0/910, this is what I see from IGV. It is in GC percentage 60% which is high GC.

What I can't understand is this black bar which seems to be a downsampling. Could this downsampling be the reason behind the low the ALT allele counts?
Hoovering over says in this interval [-] 356 reads has been removed.

My goal is to be sure whether to discard this variant or accept it as possible mosaic. I don't also if there is another meaning for the orange area rather than it is the region targeted by downsampling? Image here:


ADD COMMENTlink modified 15 days ago • written 15 days ago by Sharon190

Hi Sharon, good to see you again. It's difficult to answer. Is mosaicism expected in this case? Looking at the VCF data, it looks like the A allele is actually ~10% of the reads, but I cannot see your other fields.

Yes, due to downsampling, you may not see many of the reads with the A allele. To view all of the reads and not just the downsampled ones, you can do: View --> Preferences --> Alignments and then change the maximum coverage depth

ADD REPLYlink written 15 days ago by Kevin Blighe15k

Thanks Kevin. What if the downsampling is biased, like I think in repeating without downsampling and see if the counts change?

ADD REPLYlink written 14 days ago by Sharon190

Hi Sharon, that's an interesting view to take and I believe that you are correct (i.e., the downsampling is biased), As far as I know, when a variant caller is looking over each position to determine if a variant is present, they just take reads sequentially, and when they reach a certain number of reads, they stop looking further.

This directly relates to the finding that my colleagues and I made in a children's hospital in the UK whereby we split our BAM files into 4 different files, representing 100%, 75%, 50%, and 25% 'random' reads. In many situations, a Sanger-confirmed variant was observed in one of the lower read subsets and missed in the full (100%) set. This is possibly related to downsampling. That pipeline and methodology is on my GitHub page:

ADD REPLYlink written 14 days ago by Kevin Blighe15k

Great Kevin. Thanks so much, very much appreciated. Many thanks for also sharing this pipeline with me, will go through. Thanks :)

ADD REPLYlink modified 14 days ago • written 14 days ago by Sharon190
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 931 users visited in the last hour