Question: fpfilter - All variants fail
0
gravatar for ar.satkhol18
23 days ago by
ar.satkhol180 wrote:

Hi, I used fpfilter to filter my variants vcf from Varscan2 using the following command:

java -jar /home/art2407/miniconda3/pkgs/varscan-2.4.3-2/share/varscan-2.4.3-2/VarScan.jar fpfilter myvar.snp mybam.readcount -dream3-settings 1 > myvar.fpfilt.vcf

And I got the following output:

Loading readcounts from mybam.readcount...
Parsing variants from myvar.snp...
672 variants in input file
672 had a bam-readcount result
663 had reads1>=2
0 passed filters
672 failed filters
    0 failed because no readcounts were returned
    0 failed minimim variant count < 3
    81 failed minimum variant freq < 0.05
    0 failed minimum strandedness < 0.0
    663 failed minimum reference readpos < 0.2
    672 failed minimum variant readpos < 0.15
    663 failed minimum reference dist3 < 0.2
    672 failed minimum variant dist3 < 0.15
    0 failed maximum reference MMQS > 50
    0 failed maximum variant MMQS > 100
    0 failed maximum MMQS diff (var - ref) > 50
    0 failed maximum mapqual diff (ref - var) > 10
    0 failed minimim ref mapqual < 20
    0 failed minimim var mapqual < 30
    9 failed minimim ref basequal < 15
    672 failed minimim var basequal < 30
    0 failed maximum RL diff (ref - var) > 0.05

All variants fail at "minimum readpos" and "minimum dist3". Can anyone tell me what these parameters mean and how to adjust them?

fpfilter varscan2 • 143 views
ADD COMMENTlink modified 22 days ago by ATpoint17k • written 23 days ago by ar.satkhol180

I think it is the proximity of the altered base to the 3' end of the read. You can see that those bases are of pool(er) quality (672 failed minimim var basequal < 30) and therefore more likely to be artifacts. This is a heuristic (experience-based) filter which I would not change unless you have expert knowledge.

ADD REPLYlink modified 22 days ago • written 22 days ago by ATpoint17k

Thanks for your reply. I ran the same command on another tumour DNA data and I got similar results except that no variants failed basequal :

Parsing variants from tumor.vcf...
49 variants in input file
49 had a bam-readcount result
49 had reads1>=2
0 passed filters
49 failed filters
    0 failed because no readcounts were returned
    1 failed minimim variant count < 4
    0 failed minimum variant freq < 0.05
    8 failed minimum strandedness < 0.01
    49 failed minimum reference readpos < 0.1
    49 failed minimum variant readpos < 0.1
    49 failed minimum reference dist3 < 0.1
    49 failed minimum variant dist3 < 0.1
    0 failed maximum reference MMQS > 100
    0 failed maximum variant MMQS > 100
    0 failed maximum MMQS diff (var - ref) > 50
    0 failed maximum mapqual diff (ref - var) > 50
    0 failed minimim ref mapqual < 15
    1 failed minimim var mapqual < 15
    0 failed minimim ref basequal < 15
    0 failed minimim var basequal < 15
    0 failed maximum RL diff (ref - var) > 0.25

I tried on the same command on 2 more data files and they give the same results. Is there any way I can improve these results? Or is there any other tool I can use to filter out false positives ?

ADD REPLYlink written 20 days ago by ar.satkhol180

What kind of data is this? Some targeted capturing approach? What is the read length? It is indeed odd and indicates a systematic bias that all variants fail based on these specific filters.

ADD REPLYlink written 20 days ago by ATpoint17k

The data was obtained using targeted sequencing on illumina platform. The read length is 150bp

ADD REPLYlink written 19 days ago by ar.satkhol180

Targeted in terms of capturing beads or amplicon?

ADD REPLYlink written 19 days ago by ATpoint17k

it is capturing bead based

ADD REPLYlink modified 19 days ago • written 19 days ago by ar.satkhol180

Hi, an update about this error.. I tried setting readpos and dist3 parameters to zero, to see what result filter gives. The result is same as above. Though none of the variants are failing readpos and dist3 , they seem to be failing some parameter which is not described in the result summary. The summary I got this time is below:

446 variants in input file
441 had a bam-readcount result
418 had reads1>=2
0 passed filters
446 failed filters
    5 failed because no readcounts were returned
    14 failed minimim variant count < 3
    19 failed minimum variant freq < 0.05
    0 failed minimum strandedness < 0.0
    0 failed minimum reference readpos < 0.0
    0 failed minimum variant readpos < 0.0
    0 failed minimum reference dist3 < 0.0
    0 failed minimum variant dist3 < 0.0
    0 failed maximum reference MMQS > 50
    0 failed maximum variant MMQS > 100
    0 failed maximum MMQS diff (var - ref) > 50
    3 failed maximum mapqual diff (ref - var) > 10
    0 failed minimim ref mapqual < 20
    2 failed minimim var mapqual < 30
    8 failed minimim ref basequal < 15
    43 failed minimim var basequal < 20
    0 failed maximum RL diff (ref - var) > 0.05

Why are variants near 3' end filtered out? Is there some other parameter that can be relaxed? How do I use this filter to filter out only false positives? Please help

ADD REPLYlink modified 8 days ago • written 8 days ago by ar.satkhol180
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1119 users visited in the last hour