Mutation filtering based on Lanes
0
1
Entering edit mode
18 months ago
Gama313 ▴ 120

I am working on raw vcf from c.a. 200 samples run in a 4lanes Novaseq. For variant calling I used Lofreq since it is precise and consistent for low vaf snv. Since going low adds noise, I need to manually filter some variants. I am interested in snv observed at different proportions on different lanes. Since from literature it is known that some artifact are lane-specific, I am interested testing whethere some mutations appear prevalently in specific lanes. However, I tested also reference observations and it appeared that there is an umbalancing also for reference positions (for example: L1:260,L2:276,L3:186,L4:200). The point is that I was supposing that both reference and mutations should be balanced or, at least shoul be reference observations. My question is: how is possible that I observed such an unbalancing in many positions?

ngs calling Variant lanes • 1.1k views
ADD COMMENT
0
Entering edit mode

Since from literature it is known that some artifact are lane-specific

Which literature are you referring to? If the same pool ran on all lanes then the sequence should be equivalent.

ADD REPLY
0
Entering edit mode

See https://sites.google.com/a/broadinstitute.org/legacy-gatk-forum-discussions/2016-08-11-2016-04-07/7635-NextSeq-500-Lanes-and-BaseRecalibrator . It has been long observed that the 'empirical' quality scores (measured by non-dbSNP mismatches) vary between lanes of the same flowcell.

ADD REPLY
0
Entering edit mode

As far as I know, GATK package considers the read group (and usually the RG tag is done from lanes) for building the model so it assumes differences per lane. Moreover, it is also plausible that, if the sample is run on multiple lanes, the mutations shoul be equally represented in each lane. However, in a 10-20% of positions tested I observed the unbalancing. Most strickingly, i observed a 5-10% of positions in which the mutation is equally represented in all lanes while the reference bases are not.

ADD REPLY
0
Entering edit mode

Is the complaint here that the depth of coverage at a position is different between the lanes? This is entirely expected from a negative binomial process...

ADD REPLY
0
Entering edit mode

The compliant is that coverage is lane-dependent and also the mutational vaf. By reasoning I thought that both should be comparable and if not, it could be a 'call' of artifact. I mean, it is expected to observe a situation like L1:260,L2:276,L3:186,L4:200 where most of the positions seem balanced?

ADD REPLY
1
Entering edit mode

You should expect per-site coverage to be variable by lane. The appropriate metric is the allelic fraction of the minor allele, which you have not posted.

ADD REPLY
0
Entering edit mode

As an example, this position had a vaf for the mutatoon like L1:8,L2:6,L3:8,L4:7

ADD REPLY
0
Entering edit mode

Update: I am working on c.a 50.000 mutations from 200 FFPE samples run on Novaseq6000. I calculate AFs for all of them for each lane. I observed c.a 15.000 snv having AF diiferences of 5-10% ± 5% among lanes. I obseved no correlation with RNA degradation, GC content and duplicates I had a smaller dataset of FF samples prepared with a different protocol, and I observed c.a 5% of unbalanced snv.

ADD REPLY
1
Entering edit mode

You can compute a simple p-value for your observed difference by running your table of:

            Lane1      Lane2    Lane3    Lane4
Ref            x1         x2       x3       x4         
Alt            y1         y2       y3       y4

through a Pearson chisq test python R. My guess is only a small number of the SNVs will have a p-value of < 0.1/50000 (bonferroni)

ADD REPLY

Login before adding your answer.

Traffic: 1369 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6