2
0
Entering edit mode
5.4 years ago

Hello,

I downloaded a bunch of RNA-seq dataset from Genbank and used tophat (with Bowtie2) to map back to the genome (to create a gtf file). The data are paired-end.

Out of the 5 samples, three of them gave a strange alignement result. The right read would realign 90% of the reads BUT only 0.3% of the left reads. Obviously, I downloaded the data 3 times, made sure it was not an error from my end.

So something is wrong with the left reads, using bowtie I can realign single end with 90% for both files, and 60% on paired end data. But tophat would only work for the right reads with paired end.

I was wondering if you had any ideas what could be causing the problem, I would like to contact the authors but with a good explanation of what the issue could be.

RNA-Seq tophat alignment • 1.6k views
0
Entering edit mode
5.4 years ago
michael.ante ★ 3.6k

First, I'd check the raw-data with FastQC in order to control for low-quality tails and adapter contamination.

Second, I'd check the inner-distance with a subset of reads (aligning with bowtie2 and RSeQC's inner_distance.py). You should adjust the Tophat's --mate-inner-dist and --mate-std-dev parameters. Tophat is trying to learn the inner distance distribution, but providing theses data increases accuracy.

0
Entering edit mode

Thanks a lot, so I checked FastQC from the start and the reads are of high quality and no trace of adaptors.

0
Entering edit mode
5.4 years ago

Take a look the type of library setting in the analysis. I refer to the --library-type setting.

0
Entering edit mode

Thanks , I am rerunning Tophat with the library type argument. As Tophat works for some samples and not these ones, I am afraid the issue might be more complex!

0
Entering edit mode

Have you trimmed your sequences ?

Are you using two separate files?. One for the left and the other for the right paired sequences ?

If so, I can recall that some aligners are expecting both files to have the same number of reads, and also ordered.

If you erase a read in one of the files, and no its mate in the other, chances are that you leave an orphan read that screw up all of the paired sequences after it. In other words. It is likely that your files must be synchronized

Don't know for sure if bowtie is one of these aligners

0
Entering edit mode

That's a good point, I am going to check that out!! Thanks a lot