Question: Bowtie Vs Bwa For Indels
gravatar for Varun Gupta
4.5 years ago by
Varun Gupta1.0k
United States
Varun Gupta1.0k wrote:

Hello Everyone

I am working on some yeast strains and I am interested in INDELS. The bam files I have are the ones aligned with bowtie as aligner. I ran GATK pipeline for calling snps and INDELS using Unified genotyper but I did not get any INDELS in my vcf file produced. Doing some research as to why I am not getting any INDELS, I came across this post

What methods do you use for short read mapping?

One of the comments says

"I think it's important to note that BWA is one of the few fast mapping algorithms that allows for indels. Tools like Maq and Bowtie will not map reads if there is an insertion or deletion. I have used BWA to map 75bp Illumina reads at 20x coverage to a 30Mb fungal genome with good results."

So I went back to my bam files and tried to query the CIGAR string for I or D but didn't found anything. Does this mean bowtie won't report any reads where insertion or deletion is taking place.

Also I want to know that software's which calls for INDELS uses cigar string to tell whether INDELS are present or not or they use completely different approach?? Because If they make use of CIGAR string from bam file then it is easy to look for INDELS just in the bam file though a vcf file is more appropriate for obvious reasons like more detailed explanation.

So if bowtie does not report INDELS should i use bwa to align my reads and then look for INDELS.

Hope to hear from you soon.



bowtie bwa • 6.9k views
ADD COMMENTlink modified 4.5 years ago by lh331k • written 4.5 years ago by Varun Gupta1.0k

How long are the INDELs you're looking for?

ADD REPLYlink written 4.5 years ago by David Langenberger8.2k
gravatar for lh3
4.5 years ago by
United States
lh331k wrote:

Bowtie1 does not report any indels. If an indel is in the middle of a read, the read will be unmapped. If the indel is close to the end of a read, it will be mapped with multiple mismatches towards the end. Bowtie2 and the bwa series are able to map reads with indels.

Most variant callers can only call indels when the mapper reports them. GATK HaplotypeCaller (HC) is an exception. It does local assembly and is able to call an indel even if no reads are mapped with the indel. HC indel calling might work with bowtie1 alignment at least in theory, but even if it works, the sensitivity is probably low. Anyway, for variant calling, don't use bowtie1. Use bowtie2/bwa/bwa-mem or other modern mappers.

BTW, whether to allow indels in seeds is not critical. Both blasr and bwa-mem use exact seeds and they work well with PacBio reads with ~15% indel error rate (or >20% for read-to-read mapping). As to chimeric alignments, most Sanger read mappers report them. In the local multi-hit mode, bowtie2 and a few other NGS mappers can find chimeric alignments as well. Bwa-sw has been reporting chimeric alignments by default since 2010. Bwa-mem does that, too.

ADD COMMENTlink written 4.5 years ago by lh331k
gravatar for David Langenberger
4.5 years ago by
David Langenberger8.2k wrote:

I would recommend you Segemehl: A Fast One-Stop-Shop Mapping Tool. It allows insertions, deletions and mismatches not only in the complete alignment, but also in the seed.

Note: If you have long insertions, it will split the reads instead of writing this information to the CIGAR string. It behaves like that, because the split-read option (-S) can not only find insertions, but also greater genome rearrangements. The mapping of split reads is thus not fixed to one chromosome, or one direction. And changing chromosomes, etc. cannot be stored in the CIGAR-String.

ADD COMMENTlink written 4.5 years ago by David Langenberger8.2k

Here are some comparisons of different tools:

Hoffmann S, Otto C, Kurtz S, Sharma CM, Khaitovich P, Vogel J, Stadler PF, Hackermueller J: "Fast mapping of short sequences with mismatches, insertions and deletions using index structures", PLoS Comput Biol (2009) vol. 5 (9) pp. e1000502

enter image description here

Christian Otto, Peter F. Stadler, and Steve Hoffmann: 'Lacking alignments? The next generation sequencing mapper segemehl revisited', Bioinformatics 2014 : btu146v1-btu146 (2014)

enter image description here

ADD REPLYlink modified 4.5 years ago • written 4.5 years ago by David Langenberger8.2k

Hmm... When I read benchmarks, I leave out the mapper developed by the authors. Segemehl has rarely (if ever) been evaluated in other papers. Even the fairly complete Bioinformatics review (Fonseca et al 2012) has not included it. I intended to try it in the bwa-mem manuscript, but did not have a machine with enough RAM. It is always hard for me to put Segemehl at the right position.

ADD REPLYlink written 4.5 years ago by lh331k

I think 60 GB RAM (what one needs for the human genome) is not a problem any more. Running NGS experiments for several thousand $, but not having the money for some GB of memory seems odd to me. But we had that discussion before and I know that you don't have a machine with enough memory! :) Nevertheless, you don't have to trust the benchmark, nor do you have to use segemehl. To me, the benchmark looks quite impressive and if it is correct, I don't care about any reviews, or if the tool is well known. If it performs better and I get better results, I take it.

ADD REPLYlink written 4.5 years ago by David Langenberger8.2k
gravatar for Istvan Albert
4.5 years ago by
Istvan Albert ♦♦ 77k
University Park, USA
Istvan Albert ♦♦ 77k wrote:

Does this mean bowtie won't report any reads where insertion or deletion is taking place

Just a clarification, it is not that it won't report the insertion or deletion, what happens is that the algorithm is unable align that read at all.

This means that a read with an INDEL will be reported as unmapped with no alignment information.

ADD COMMENTlink written 4.5 years ago by Istvan Albert ♦♦ 77k

@Istvan : Hi, Have you seen a case where reads are mapped with bowtie and CIGAR string ever shows I or D in it. Also if I or D are present in the CIGAR string (mapped with other aligner), does snp calling softwares make use of the CIGAR string to make the vcf file for INDELS??


ADD REPLYlink written 4.5 years ago by Varun Gupta1.0k

Well since the CIGAR string does not even list a mismatch (M means match or mismatch) the SNP calling software can't use the CIGAR strings to determine even simple polymorphisms.

ADD REPLYlink written 4.5 years ago by Istvan Albert ♦♦ 77k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1888 users visited in the last hour