Question

Aligning very short and paired Illumina reads

0

Entering edit mode

8.4 years ago

Rox ★ 1.5k

Hi everyone,

I faced a issue using bwa for some very shorts reads I have (49bp). I have numerous paired Illumina reads that I want to align against an assembly I made.

I saw this previous post about paired-reads aligning with several files : BWA mem on multiple samples , which propose to use bwa mem in a parallel way.

In bwa readme, it is written that, for reads shorter than 70bp, we should proceed this way :

bwa aln ref.fa read1.fq > read1.sai; bwa aln ref.fa read2.fq > read2.sai
bwa sampe ref.fa read1.sai read2.sai read1.fq read2.fq > aln-pe.sam

When I tried this, the bwa aln part worked fine, but the bwa sampe step never ended. Considering that each file contain approximatively 8-10 Giga of data, I have no idea why it take so long (after 2days, I stopped the process).

What do you think about this ? Should I use an other aligner ?

Thanks for your advices,

Roxane

alignement Illumina genome • 2.4k views

ADD COMMENT • link 8.4 years ago by Rox ★ 1.5k

1

Entering edit mode

I recommend bbmap.sh from BBMap suite. Fast, easy to use, multi-threaded, pure java so will run pretty much anywhere. As long as you have samtools in your path you can directly create BAM files during alignments.

ADD REPLY • link 8.4 years ago by GenoMax 152k

0

Entering edit mode

Okay, thanks for your fast reply. Do you think it will be suited for very short reads like mines ?

ADD REPLY • link 8.4 years ago by Rox ★ 1.5k

0

Entering edit mode

Yes. (Had to add this to reach min char limit).

ADD REPLY • link 8.4 years ago by GenoMax 152k

1

Entering edit mode

I had a situation where bwa sampe was abnormally slow; actually the problem was that the reverse reads (read2) had a sequencing defect: they were mostly homopolymer stretches that aligned wrongly at multiple places on the genome, and bwa sampe was spending a lot of time to evaluate a large number of equally bad possibilities...

ADD REPLY • link 8.4 years ago by Charles Plessy ★ 2.9k

0

Entering edit mode

Oh this is interesting, so how did you detected this problem ? IS FastQC enough to detect such an issue ? How did you solved it ?

ADD REPLY • link 8.4 years ago by Rox ★ 1.5k

0

Entering edit mode

I inspected the file contents directly with the zless command, but FastQC would have showed the problem for sure. After double-checking that we did not make any obvious error, we contacted Illumina's technical support, and indeed this time the problem was on the sequencer's side, so they sent us a free kit, which worked perfectly.

ADD REPLY • link 8.4 years ago by Charles Plessy ★ 2.9k