Question: INDEL left alignment in bam and vcf
gravatar for mark.rose
2.2 years ago by
United States
mark.rose30 wrote:

Hello I am performing alignments of amplicons against a reference. Many of these amplicons have sizeable deletions. When aligning reads with such deletions, some aligners (bowtie2, bwa) soft clip the reads or fail to align them. Others, like blasr, produce an alignment which successfully spans the deletion. However blasr does not left align the deletion and so winds up giving a series of smaller deletions rather than one larger deletion. Manipultion of blasr gap penalties has thus far failed to correct this. Now I could use a tool like "vt normalize" to left shift the alignment post-blasr in the vcf, however that would produce a situation where the vcf and a visualization of the bam file disagree. This is a problem because I am doing this as part of developing a custom tool for other scientists (who want both the visualization and vcf deriveed info) and I am trying to avoid confusing them. Can anyone suggest a tool or procedure that would help here, presumably an aligner that produces bam output and is capable of left aligning larger deletions or a process that can be applied to the bam file to create a new left aligned bam file? Thanks Mark

blasr bwa bowtie aligner indel • 996 views
ADD COMMENTlink modified 2.2 years ago by Brian Bushnell16k • written 2.2 years ago by mark.rose30
gravatar for Brian Bushnell
2.2 years ago by
Walnut Creek, USA
Brian Bushnell16k wrote:

BBMap will span very long deletions, and always left-aligns them: ref=ref.fa in=reads.fq out=mapped.sam maxindel=100000

By default, it does not soft-clip anything except when reads go off the ends of chromosomes, as required by the sam spec, so it does not incur ref-bias like local alignment algorithms. Also, unlike Blasr, BBMap uses multi-affine-transformed gap-extension scoring so that long indels are kept contiguous rather than breaking them up.

You can subsequently call the variations using BBMap's variant-caller: in=mapped.sam ref=ref.fa out=vars.vcf ploidy=2

(set ploidy as appropriate for the organism being sequenced)

ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by Brian Bushnell16k

Hey Brian

Thanks for the suggestion. Worked like a charm. Will look more into the bbmap package.


ADD REPLYlink written 2.2 years ago by mark.rose30
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 823 users visited in the last hour