Question

indel preprocessing prior to bam-readcount

0

Entering edit mode

7.6 years ago

Jokhe ▴ 140

What kind of preprocessing do you need to use bam-readcount and fpfilter.pl for indels?

This question has been already asked couple of times but so far I haven't found any clear answer (scripts or other examples) for them. So, I have a set of VCF files generated by VarScan2. These VCF files contain somatic indels occuring in tumor samples. I am trying to filter these indels by using bam-readcount and fpfilte.pl. The format of VCF file is following;

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  NORMAL  TUMOR
 chr12  46236017    .   CA  C   .   PASS    DP=177;SS=1;SSC=0;GPV=6.0385E-14;SPV=8.3811E-1  GT:GQ:DP:RD:AD:FREQ:DP4 0/1:.:66:44:17:27.87%:44,0,17,0 0/1:.:111:80:23:22.33%:80,0,23,0

My script is presented below;

# Prepare input file by selecting data from VCF input file.
sed '/^#/ d' $file | awk '{print $1,$2,$2}' > $file.positions

# Calculate read features by using bam-readcount tool.
$BAM_READCOUNT -w1 -f $REFERENCE ${file%.vcf}.bam -l $file.positions > $file.readcounts
rm $file.positions

# Filter variants by using fpfilter.pl.
perl $FPFILTER --var-file $file --readcount-file $file.readcounts --output-file $file.output
rm $file.readcounts

 # Select variants that pass all filters.
grep "PASS" $file.output | awk '{print $1,$2-2,$2+1}' > $file.pass
rm $file.output

# Filter original VCF file so that only variants passing all filters are included.
vcftools --vcf $file --positions $file.pass --recode --out ${file%.vcf}.filtered.vcf
rm *.pass
rm *.log
rm $file

However, this approad filters all indels from my data. The problem seems to be in readcounts, because the prompt indicates that;

Filtering $file for false-positives...
4 variants
0 had a position near the ends of most supporting reads (position < 0.1)
0 had strandedness < 0.01 (most supporting reads are in the same direction)
0 had var_count < 3 (not enough supporting reads)
0 had depth < 8
...
0 passed all filters
4 had no readcounts for the variant allele

I would really appreciate tips and example scripts to see how raw variant data from VCF should be extracted and used in bam-readcount and fpfilter.

bam-readcount indel fpfilter.pl VarScan • 4.2k views

ADD COMMENT • link updated 7.5 years ago by m.b • 0 • written 7.6 years ago by Jokhe ▴ 140

1

Entering edit mode

chr12 46236017 . CA C

for bam-readcount, you need to supply chr12 46236018 46236018 as the position

ADD REPLY • link 6.5 years ago by Ming Tommy Tang ★ 3.9k

0

Entering edit mode

I have the same problem, and apparently there is no reliable answer out there by now. How did you solve the problem for your data?

ADD REPLY • link 6.6 years ago by ATpoint 82k

score 0 · Answer 1 · 2016-10-11

0

Entering edit mode

7.5 years ago

m.b • 0

I have been facing a similar problem. I see that the main issue for your filtering process is: "no readcounts for the variant allele". Can you please tell me if your 4 variants are all deletions?

I have a larger dataset with insertions and deletions; and all my deletions have not passed the filtering because there is no read count for them. This is a bam-readcount problem.

I went back to the bam-readcount github page and it seems that this is a recognised bug. It does not return readcounts at positions of known deletions. Please see link below: https://github.com/genome/bam-readcount/issues/27

Unfortunately, I am not sure how to get around this....

ADD COMMENT • link 7.5 years ago by m.b • 0

0

Entering edit mode

Thank you for your answer. The example case I am presenting in this post contains only deletions so it is very likely that this is caused by the bug you indicated. However, I have similar problem with all of my samples and some of them do contain also insertions. Insertions are also filtered out due to "no readcounts".

ADD REPLY • link 7.5 years ago by Jokhe ▴ 140