Hello I am performing alignments of amplicons against a reference. Many of these amplicons have sizeable deletions. When aligning reads with such deletions, some aligners (bowtie2, bwa) soft clip the reads or fail to align them. Others, like blasr, produce an alignment which successfully spans the deletion. However blasr does not left align the deletion and so winds up giving a series of smaller deletions rather than one larger deletion. Manipultion of blasr gap penalties has thus far failed to correct this. Now I could use a tool like "vt normalize" to left shift the alignment post-blasr in the vcf, however that would produce a situation where the vcf and a visualization of the bam file disagree. This is a problem because I am doing this as part of developing a custom tool for other scientists (who want both the visualization and vcf deriveed info) and I am trying to avoid confusing them. Can anyone suggest a tool or procedure that would help here, presumably an aligner that produces bam output and is capable of left aligning larger deletions or a process that can be applied to the bam file to create a new left aligned bam file? Thanks Mark
BBMap will span very long deletions, and always left-aligns them:
bbmap.sh ref=ref.fa in=reads.fq out=mapped.sam maxindel=100000
By default, it does not soft-clip anything except when reads go off the ends of chromosomes, as required by the sam spec, so it does not incur ref-bias like local alignment algorithms. Also, unlike Blasr, BBMap uses multi-affine-transformed gap-extension scoring so that long indels are kept contiguous rather than breaking them up.
You can subsequently call the variations using BBMap's variant-caller:
callvariants.sh in=mapped.sam ref=ref.fa out=vars.vcf ploidy=2
(set ploidy as appropriate for the organism being sequenced)