Question: Aligning very short and paired Illumina reads
gravatar for Roxane Boyer
6 months ago by
Roxane Boyer240
Roxane Boyer240 wrote:

Hi everyone,

I faced a issue using bwa for some very shorts reads I have (49bp). I have numerous paired Illumina reads that I want to align against an assembly I made.

I saw this previous post about paired-reads aligning with several files : BWA mem on multiple samples , which propose to use bwa mem in a parallel way.

In bwa readme, it is written that, for reads shorter than 70bp, we should proceed this way :

bwa aln ref.fa read1.fq > read1.sai; bwa aln ref.fa read2.fq > read2.sai
bwa sampe ref.fa read1.sai read2.sai read1.fq read2.fq > aln-pe.sam

When I tried this, the bwa aln part worked fine, but the bwa sampe step never ended. Considering that each file contain approximatively 8-10 Giga of data, I have no idea why it take so long (after 2days, I stopped the process).

What do you think about this ? Should I use an other aligner ?

Thanks for your advices,


alignement illumina genome • 273 views
ADD COMMENTlink written 6 months ago by Roxane Boyer240

I recommend from BBMap suite. Fast, easy to use, multi-threaded, pure java so will run pretty much anywhere. As long as you have samtools in your path you can directly create BAM files during alignments.

ADD REPLYlink modified 6 months ago • written 6 months ago by genomax32k

Okay, thanks for your fast reply. Do you think it will be suited for very short reads like mines ?

ADD REPLYlink written 6 months ago by Roxane Boyer240

Yes. (Had to add this to reach min char limit).

ADD REPLYlink written 6 months ago by genomax32k

I had a situation where bwa sampe was abnormally slow; actually the problem was that the reverse reads (read2) had a sequencing defect: they were mostly homopolymer stretches that aligned wrongly at multiple places on the genome, and bwa sampe was spending a lot of time to evaluate a large number of equally bad possibilities...

ADD REPLYlink written 6 months ago by Charles Plessy2.1k

Oh this is interesting, so how did you detected this problem ? IS FastQC enough to detect such an issue ? How did you solved it ?

ADD REPLYlink written 6 months ago by Roxane Boyer240

I inspected the file contents directly with the zless command, but FastQC would have showed the problem for sure. After double-checking that we did not make any obvious error, we contacted Illumina's technical support, and indeed this time the problem was on the sequencer's side, so they sent us a free kit, which worked perfectly.

ADD REPLYlink written 6 months ago by Charles Plessy2.1k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 584 users visited in the last hour