Question: Is is a possible mosaic / effect of downsampling?
0
gravatar for Sharon
9 months ago by
Sharon380
Sharon380 wrote:

Hi Everyone

I am trying to check if this low freq variant is ineresting, if it is also mosaic.

AC=1;AF=0.5;AN=2;DP=910; 0/1:99:910:0.095:99:549,0,17114:824,86  and QD=0.57;SF=8f.

The region around it has similar coverage, some snps is low frequent AD like 2/910 or 6/906 or 0/910, this is what I see from IGV. It is in GC percentage 60% which is high GC.

What I can't understand is this black bar which seems to be a downsampling. Could this downsampling be the reason behind the low the ALT allele counts?
Hoovering over says in this interval [-] 356 reads has been removed.

My goal is to be sure whether to discard this variant or accept it as possible mosaic. I don't also if there is another meaning for the orange area rather than it is the region targeted by downsampling? Image here:

https://ibb.co/fraAyS

Thanks

ADD COMMENTlink modified 9 months ago • written 9 months ago by Sharon380
1

Hi Sharon, good to see you again. It's difficult to answer. Is mosaicism expected in this case? Looking at the VCF data, it looks like the A allele is actually ~10% of the reads, but I cannot see your other fields.

Yes, due to downsampling, you may not see many of the reads with the A allele. To view all of the reads and not just the downsampled ones, you can do: View --> Preferences --> Alignments and then change the maximum coverage depth

ADD REPLYlink written 9 months ago by Kevin Blighe33k

Thanks Kevin. What if the downsampling is biased, like I think in repeating without downsampling and see if the counts change?

ADD REPLYlink written 9 months ago by Sharon380
1

Hi Sharon, that's an interesting view to take and I believe that you are correct (i.e., the downsampling is biased), As far as I know, when a variant caller is looking over each position to determine if a variant is present, they just take reads sequentially, and when they reach a certain number of reads, they stop looking further.

This directly relates to the finding that my colleagues and I made in a children's hospital in the UK whereby we split our BAM files into 4 different files, representing 100%, 75%, 50%, and 25% 'random' reads. In many situations, a Sanger-confirmed variant was observed in one of the lower read subsets and missed in the full (100%) set. This is possibly related to downsampling. That pipeline and methodology is on my GitHub page: https://github.com/kevinblighe/ClinicalGradeDNAseq

ADD REPLYlink written 9 months ago by Kevin Blighe33k
1

Great Kevin. Thanks so much, very much appreciated. Many thanks for also sharing this pipeline with me, will go through. Thanks :)

ADD REPLYlink modified 9 months ago • written 9 months ago by Sharon380
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2147 users visited in the last hour