Question: How realible samtools/ bam-readcount /igvtools /pysamstats for finding deletions/insertions?
gravatar for IrK
2.5 years ago by
IrK50 wrote:

Dear Biostarts users,

This is my first time when I have to use variant-calling software/packages, so I would like to ask the opinion on mentioned below tools. How reliable are results produced by them?

My goal is to find appearance of A,C,T,G or DEL, INS at each base across chr1 (Not a whole genome). After researching on the topic "count frequency of A,C,T,G, Del, and Ins at each base" I ended up with the following options:

(1) samtools mpileup  / bcftool call
(2) bam-readcount
(3) igvtools
(4) pysamstats

The performance of the IGVtools is same as the Pysamstats, and I liked the output (it is clear and that's exactly what I need). I understand that probably the bam-readcount use another default options, therefore results are different to previous two. But I found the output results a bit messy for my task. I also managed to get results in VCF format, but couldn't get desired frequency of A's, T's, C's, G's, DEL, INS. I saw the advice of using pileup2baseindel, but it seems to be dated (Update 08/04/2012). Could anyone advice how to get frequency if A,C,T,G,Del, Ins from VCF format? And what would you use if you task is to find only frequency of deletions, and NTs at each base?

Thank you!

ADD COMMENTlink modified 2.4 years ago by Biostar ♦♦ 20 • written 2.5 years ago by IrK50

Are you purely interested in the occurrences in the data or do you want to find actual variants? Option 1 finds actual variants, the others don't.

ADD REPLYlink written 2.5 years ago by Devon Ryan94k

Out of interest, I typically use bam-readcount after mpileup/VarScan2 for somatic mutations, as recommended by the authors. Any personal experience if this false-positive filter from VarScan2 that needs the bamRC output is actually advisable?

ADD REPLYlink written 2.5 years ago by ATpoint30k

Yeah, it's very advisable. It includes lots of heuristic filters that remove junk calls.

ADD REPLYlink written 2.5 years ago by Chris Miller21k

Thanks for the comment! Additionally, my filtering for matched WGS now includes removal of variants that intersect with low complexity regions, those with a DP beyond the 95th percentile of the average variant depth (=> LC and MD filter as suggested by Heng Li's paper from 2014), and those that have a AF > 0 in the 1KG. That should deal with the vast majority of false-positives and common somatic variants.

ADD REPLYlink written 2.5 years ago by ATpoint30k

Another I've found useful is to remove variants from regions where at least 10% of reads have MQ 0.

ADD REPLYlink written 2.5 years ago by Chris Miller21k

I just would like to summarize base-calls of aligned reads to each position. So it means that we get kind of base-call summary from options (2), (3), (4), can you trust this statistics?

ADD REPLYlink written 2.5 years ago by IrK50

I've never used bam-readcount, but in general this is a hard thing to get wrong, so the results should be correct.

ADD REPLYlink written 2.5 years ago by Devon Ryan94k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1055 users visited in the last hour