Question: Aligning very short and paired Illumina reads
gravatar for Rox
3.2 years ago by
France / Toulouse / GeT-Plage
Rox1.1k wrote:

Hi everyone,

I faced a issue using bwa for some very shorts reads I have (49bp). I have numerous paired Illumina reads that I want to align against an assembly I made.

I saw this previous post about paired-reads aligning with several files : BWA mem on multiple samples , which propose to use bwa mem in a parallel way.

In bwa readme, it is written that, for reads shorter than 70bp, we should proceed this way :

bwa aln ref.fa read1.fq > read1.sai; bwa aln ref.fa read2.fq > read2.sai
bwa sampe ref.fa read1.sai read2.sai read1.fq read2.fq > aln-pe.sam

When I tried this, the bwa aln part worked fine, but the bwa sampe step never ended. Considering that each file contain approximatively 8-10 Giga of data, I have no idea why it take so long (after 2days, I stopped the process).

What do you think about this ? Should I use an other aligner ?

Thanks for your advices,


alignement illumina genome • 1.2k views
ADD COMMENTlink written 3.2 years ago by Rox1.1k

I recommend from BBMap suite. Fast, easy to use, multi-threaded, pure java so will run pretty much anywhere. As long as you have samtools in your path you can directly create BAM files during alignments.

ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by genomax80k

Okay, thanks for your fast reply. Do you think it will be suited for very short reads like mines ?

ADD REPLYlink written 3.2 years ago by Rox1.1k

Yes. (Had to add this to reach min char limit).

ADD REPLYlink written 3.2 years ago by genomax80k

I had a situation where bwa sampe was abnormally slow; actually the problem was that the reverse reads (read2) had a sequencing defect: they were mostly homopolymer stretches that aligned wrongly at multiple places on the genome, and bwa sampe was spending a lot of time to evaluate a large number of equally bad possibilities...

ADD REPLYlink written 3.2 years ago by Charles Plessy2.7k

Oh this is interesting, so how did you detected this problem ? IS FastQC enough to detect such an issue ? How did you solved it ?

ADD REPLYlink written 3.2 years ago by Rox1.1k

I inspected the file contents directly with the zless command, but FastQC would have showed the problem for sure. After double-checking that we did not make any obvious error, we contacted Illumina's technical support, and indeed this time the problem was on the sequencer's side, so they sent us a free kit, which worked perfectly.

ADD REPLYlink written 3.2 years ago by Charles Plessy2.7k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1634 users visited in the last hour