Question: bwa mem misaligns my contigs to the reference by a few bases
1
gravatar for cristian
22 months ago by
cristian220
cristian220 wrote:

Dear all,

I have indexed the C. elegans reference genome with:

bwa index output/genome/ref/seq/celegans.fa

and then aligned my de novo assembly to the reference with:

bwa mem -t 8 -x intractg output/genome/ref/seq/celegans.fa input/assembly/celegans/hgap/bristol/assembly.fa > output/alignment/pacbio/bwa/ref/bristolAssembly.sam

I then

samtools view -bS output/alignment/pacbio/bwa/ref/bristolAssembly.sam > output/alignment/pacbio/bwa/ref/bristolAssembly.bam

samtools sort output/alignment/pacbio/bwa/ref/bristolAssembly.bam -o output/alignment/pacbio/bwa/ref/bristolAssemblySorted.bam

samtools index output/alignment/pacbio/bwa/ref/bristolAssemblySorted.bam

to visualise the alignment in IGV.

This is what I get: http://cristian-riccio.ch/wp-content/uploads/2017/07/igv_snapshot.png

You can see that my assembly is the same as the reference, just shifted by 12 bases or so. Has anyone any suggestions about how to solve this problem?

Thanks.

Best,

C.

bwt bwa bwa mem alignment contig • 1.2k views
ADD COMMENTlink modified 10 months ago by Biostar ♦♦ 20 • written 22 months ago by cristian220
1

my assembly is the same as the reference

Have you tried to use the actual reference sequence you used for your own alignments (instead of going on the label of genome build) in IGV?

ADD REPLYlink modified 22 months ago • written 22 months ago by genomax65k

Thanks, good idea. I've tried but there is no change. I went on the Genomes tab in IGV, loaded the reference I used for alignment and then loaded my BAM file. Still the same.

ADD REPLYlink written 22 months ago by cristian220
1

Hmm. You are actually aligning the assembly not the original reads. Assuming the assembly is long, have you tried to use a tool meant for such alignments as blat, LAST, or LASTZ? Since this is C. elegans, even mauve may be able to handle the alignments with the added benefit of showing you any rearrangements.

ADD REPLYlink modified 22 months ago • written 22 months ago by genomax65k

genomax, do you know if Mauve can output a BAM file?

ADD REPLYlink written 22 months ago by cristian220
1

is it really shifted, or just IGV is showing you this 12 bp shift?

ADD REPLYlink written 22 months ago by stolarek.ir580

It is really shifted I think. On this image, I am only showing the mismatched bases and you can see that there are many mismatches despite the two sequences being the same. http://cristian-riccio.ch/wp-content/uploads/2017/07/igv_snapshotreallyshifted.png

ADD REPLYlink written 22 months ago by cristian220
1

I'm asking about the BAM file, not the IGV visualization. If it's a true missalignment, then you can calculate from the BAM fields (like NM) the number of differences, and from the CIGAR string you can track down where these mismatches could be if truly they exist

ADD REPLYlink written 22 months ago by stolarek.ir580
2
gravatar for cristian
22 months ago by
cristian220
cristian220 wrote:

Solved finally!

http://cristian-riccio.ch/wp-content/uploads/2017/07/igv_snapshotSolution.png

I used this command instead, penalising gap extensions less.

bwa mem -t 8 -E0.5 -x intractg output/genome/ref/seq/celegans.fa input/assembly/celegans/hgap/bristol/assembly.fa > output/alignment/pacbio/bwa/ref/bristolAssembly.sam

The assembly comes from pacbio, and this technology is 12% indel error in the raw reads, that might be why the contigs also need a bit more lee-ways on indels for alignment.

I figured it out with a colleague because we looked at the start of the chromosomes and it started off aligned and got misaligned when there was a larger indel in my assembly.

Thanks to all for the suggestions.

Best,

C.

ADD COMMENTlink written 22 months ago by cristian220
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1085 users visited in the last hour