Aligning very short and paired Illumina reads
0
0
Entering edit mode
7.2 years ago
Rox ★ 1.4k

Hi everyone,

I faced a issue using bwa for some very shorts reads I have (49bp). I have numerous paired Illumina reads that I want to align against an assembly I made.

I saw this previous post about paired-reads aligning with several files : BWA mem on multiple samples , which propose to use bwa mem in a parallel way.

In bwa readme, it is written that, for reads shorter than 70bp, we should proceed this way :

bwa aln ref.fa read1.fq > read1.sai; bwa aln ref.fa read2.fq > read2.sai
bwa sampe ref.fa read1.sai read2.sai read1.fq read2.fq > aln-pe.sam

When I tried this, the bwa aln part worked fine, but the bwa sampe step never ended. Considering that each file contain approximatively 8-10 Giga of data, I have no idea why it take so long (after 2days, I stopped the process).

What do you think about this ? Should I use an other aligner ?

Thanks for your advices,

Roxane

alignement Illumina genome • 2.1k views
ADD COMMENT
1
Entering edit mode

I recommend bbmap.sh from BBMap suite. Fast, easy to use, multi-threaded, pure java so will run pretty much anywhere. As long as you have samtools in your path you can directly create BAM files during alignments.

ADD REPLY
0
Entering edit mode

Okay, thanks for your fast reply. Do you think it will be suited for very short reads like mines ?

ADD REPLY
0
Entering edit mode

Yes. (Had to add this to reach min char limit).

ADD REPLY
1
Entering edit mode

I had a situation where bwa sampe was abnormally slow; actually the problem was that the reverse reads (read2) had a sequencing defect: they were mostly homopolymer stretches that aligned wrongly at multiple places on the genome, and bwa sampe was spending a lot of time to evaluate a large number of equally bad possibilities...

ADD REPLY
0
Entering edit mode

Oh this is interesting, so how did you detected this problem ? IS FastQC enough to detect such an issue ? How did you solved it ?

ADD REPLY
0
Entering edit mode

I inspected the file contents directly with the zless command, but FastQC would have showed the problem for sure. After double-checking that we did not make any obvious error, we contacted Illumina's technical support, and indeed this time the problem was on the sequencer's side, so they sent us a free kit, which worked perfectly.

ADD REPLY

Login before adding your answer.

Traffic: 2426 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6