Question

How realible samtools/ bam-readcount /igvtools /pysamstats for finding deletions/insertions?

0

Entering edit mode

6.7 years ago

IrK ▴ 70

Dear Biostarts users,

This is my first time when I have to use variant-calling software/packages, so I would like to ask the opinion on mentioned below tools. How reliable are results produced by them?

My goal is to find appearance of A,C,T,G or DEL, INS at each base across chr1 (Not a whole genome). After researching on the topic "count frequency of A,C,T,G, Del, and Ins at each base" I ended up with the following options:

(1) samtools mpileup  / bcftool call
(2) bam-readcount
(3) igvtools
(4) pysamstats

The performance of the IGVtools is same as the Pysamstats, and I liked the output (it is clear and that's exactly what I need). I understand that probably the bam-readcount use another default options, therefore results are different to previous two. But I found the output results a bit messy for my task. I also managed to get results in VCF format, but couldn't get desired frequency of A's, T's, C's, G's, DEL, INS. I saw the advice of using pileup2baseindel, but it seems to be dated (Update 08/04/2012). Could anyone advice how to get frequency if A,C,T,G,Del, Ins from VCF format? And what would you use if you task is to find only frequency of deletions, and NTs at each base?

Thank you!

samtools bam-readcounts igvtools pysamstats • 3.5k views

ADD COMMENT • link updated 6.6 years ago by Biostar 20 • written 6.7 years ago by IrK ▴ 70

0

Entering edit mode

Are you purely interested in the occurrences in the data or do you want to find actual variants? Option 1 finds actual variants, the others don't.

ADD REPLY • link 6.7 years ago by Devon Ryan 104k

0

Entering edit mode

Out of interest, I typically use bam-readcount after mpileup/VarScan2 for somatic mutations, as recommended by the authors. Any personal experience if this false-positive filter from VarScan2 that needs the bamRC output is actually advisable?

ADD REPLY • link 6.7 years ago by ATpoint 82k

0

Entering edit mode

Yeah, it's very advisable. It includes lots of heuristic filters that remove junk calls.

ADD REPLY • link 6.6 years ago by Chris Miller 22k

0

Entering edit mode

Thanks for the comment! Additionally, my filtering for matched WGS now includes removal of variants that intersect with low complexity regions, those with a DP beyond the 95th percentile of the average variant depth (=> LC and MD filter as suggested by Heng Li's paper from 2014), and those that have a AF > 0 in the 1KG. That should deal with the vast majority of false-positives and common somatic variants.

ADD REPLY • link 6.6 years ago by ATpoint 82k

0

Entering edit mode

Another I've found useful is to remove variants from regions where at least 10% of reads have MQ 0.

ADD REPLY • link 6.6 years ago by Chris Miller 22k

0

Entering edit mode

I just would like to summarize base-calls of aligned reads to each position. So it means that we get kind of base-call summary from options (2), (3), (4), can you trust this statistics?

ADD REPLY • link 6.7 years ago by IrK ▴ 70

0

Entering edit mode

I've never used bam-readcount, but in general this is a hard thing to get wrong, so the results should be correct.

ADD REPLY • link 6.7 years ago by Devon Ryan 104k