Question: How realible samtools/ bam-readcount /igvtools /pysamstats for finding deletions/insertions?
0
gravatar for IrK
11 weeks ago by
IrK10
Australia
IrK10 wrote:

Dear Biostarts users,

This is my first time when I have to use variant-calling software/packages, so I would like to ask the opinion on mentioned below tools. How reliable are results produced by them?

My goal is to find appearance of A,C,T,G or DEL, INS at each base across chr1 (Not a whole genome). After researching on the topic "count frequency of A,C,T,G, Del, and Ins at each base" I ended up with the following options:

(1) samtools mpileup  / bcftool call
(2) bam-readcount
(3) igvtools
(4) pysamstats

The performance of the IGVtools is same as the Pysamstats, and I liked the output (it is clear and that's exactly what I need). I understand that probably the bam-readcount use another default options, therefore results are different to previous two. But I found the output results a bit messy for my task. I also managed to get results in VCF format, but couldn't get desired frequency of A's, T's, C's, G's, DEL, INS. I saw the advice of using pileup2baseindel, but it seems to be dated (Update 08/04/2012). Could anyone advice how to get frequency if A,C,T,G,Del, Ins from VCF format? And what would you use if you task is to find only frequency of deletions, and NTs at each base?

Thank you!

ADD COMMENTlink modified 5 weeks ago by Biostar ♦♦ 20 • written 11 weeks ago by IrK10

Are you purely interested in the occurrences in the data or do you want to find actual variants? Option 1 finds actual variants, the others don't.

ADD REPLYlink written 11 weeks ago by Devon Ryan73k

Out of interest, I typically use bam-readcount after mpileup/VarScan2 for somatic mutations, as recommended by the authors. Any personal experience if this false-positive filter from VarScan2 that needs the bamRC output is actually advisable?

ADD REPLYlink written 11 weeks ago by ATPoint2.5k

Yeah, it's very advisable. It includes lots of heuristic filters that remove junk calls.

ADD REPLYlink written 10 weeks ago by Chris Miller19k

Thanks for the comment! Additionally, my filtering for matched WGS now includes removal of variants that intersect with low complexity regions, those with a DP beyond the 95th percentile of the average variant depth (=> LC and MD filter as suggested by Heng Li's paper from 2014), and those that have a AF > 0 in the 1KG. That should deal with the vast majority of false-positives and common somatic variants.

ADD REPLYlink written 10 weeks ago by ATPoint2.5k

Another I've found useful is to remove variants from regions where at least 10% of reads have MQ 0.

ADD REPLYlink written 10 weeks ago by Chris Miller19k

I just would like to summarize base-calls of aligned reads to each position. So it means that we get kind of base-call summary from options (2), (3), (4), can you trust this statistics?

ADD REPLYlink written 11 weeks ago by IrK10

I've never used bam-readcount, but in general this is a hard thing to get wrong, so the results should be correct.

ADD REPLYlink written 11 weeks ago by Devon Ryan73k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 691 users visited in the last hour