Bowtie2 treats reads as not mapping even if the reads have exact same sequences with reference?
1
2
Entering edit mode
8.4 years ago
shl198 ▴ 430

Hi all,

I aligned my RNA-seq against reference genome using tophat, I used the default aligner bowtie2.

And also the default parameters:

tophat -p 8 -G $annotation -o out$database L1_1.fq.gz L1_2.fq.gz


After got the results, I found out that in the unmapped.bam file, some reads have exact same sequences with the reference. The follow is one line in the unmapped.sam file:

DGZN8DQ1:360:H9RN8ADXX:1:1101:4791:1895 69      *       0       255     *
*       0       0       TTTTGCTTTCTGACTCTGTGCTTGTGCCTTCAAGACTTTCACAACGATTTTCTGCTCCTCAATAAGGAAAGCCCGAGATCGGAAGAGCACACGTCTGAAC    CCCFFFFFHHHHHJJJJJJJIJJJHIJJJJJIJJJIJJJJIJJJJJIJJJJJJJJJJJJIJIJJJJJIJJJJJJHHFFDEDDDDDDDDDDDDDDDDDCCD


Does anyone know why the bowtie2 doesn't treat those reads as mapped? Thanks

bowtie2 quality tophat RNA-Seq • 2.9k views
2
Entering edit mode
8.4 years ago

Dirty little secret: bowtie2 doesn't always find exact matches. If you change the order of reads in a file you'll sometimes get different alignment results for them. I've never bothered to find the reason, since this ends up affecting very few reads.

0
Entering edit mode

Hi Devon, thank you very much. I just tried mapping using bowtie2 directly instead of tophat, the result increased a little, and I also blast the unmapped reads, most of them mapped to mouse ribosomal RNA.

I didn't change the annotation file, and I made sure there are rRNA reference in the gff file. In this case, the reads should map to the reference, but they didn't.

1
Entering edit mode

Perhaps, but it's more likely that the reads map so many times that they're discarded. There are enough copies of rRNA in the genome that this could be the case. I should add that I don't use tophat anymore, it's just too painfully slow. Give STAR a try if you have enough RAM.

0
Entering edit mode

Thank you very much. I will try STAR.