how to optimize Tophat for targeted RNA data analysis
0
0
Entering edit mode
5.5 years ago
genya35 ▴ 40

Hello,

Could someone please suggest optimum TopHat parameters to analyze Ion Torrent targeted RNA data. I need to identify the breakpoints and also unaligned reads in IGV. I plan to run TopHat through Galaxy to test it out, before installing it on the server.

Any suggestions would be greatly appreciated.

Thanks

RNA-Seq • 1.3k views
1
Entering edit mode

Please stop using TopHat. Even more so with Ion Torrent data.

Quote from TopHat web site:

Please note that TopHat has entered a low maintenance, low support stage as it is now largely superseded by HISAT2 which provides the same core functionality (i.e. spliced alignment of RNA-Seq reads), in a more accurate and much more efficient way.

There are much better solutions out there (HISAT2, STAR and many other splice-aware aligners).

0
Entering edit mode

Could you please suggest parameters to use to optimize alignment?

0
Entering edit mode

Hi, why don't you take a look at these:

If you must use TopHat out of curiosity, then just provide it with good data, i.e., reads >50bp and that have base-qualities >30 at the red ends. Start with the quality threshold bars high and then tailour back if needed.

As genomax mentioned, TopHat is effectively retired, and it has been replaced by HiSAT.

0
Entering edit mode

Hi Kevin,

Sorry for a basic question but I just ran my fastQ file through HISAT using default parameters and here are the stats:

345063 reads; of these:
345063 (100.00%) were unpaired; of these:
35597 (10.32%) aligned 0 times
260962 (75.63%) aligned exactly 1 time
48504 (14.06%) aligned >1 times
89.68% overall alignment rate


It produced a .bam (19,219 K) and an index .bai. I've imported the file into IGV using the link but I don't see anything. Where are the reads? Thanks for your help.

1
Entering edit mode

You have to zoom in significantly before you start seeing the reads. You have a very small amount of reads (for an RNAseq dataset) so you can either move around the genome in IGV till you find the reads (or pick a gene you know should be represented) and then go to that region directly.

0
Entering edit mode

Thank you so much, I see them now!. Is there a way to optimize the alignment? Thanks

0
Entering edit mode

What does optimize mean? How could it be improved?

0
Entering edit mode

Last, I used RNA STAR to align the fastq and the output is looking good. I was able to create an aligned bam file and I can see soft clip bases in IGV. However, how do I see the full length fusion reads that were not mapped? I can see an evidence of fusion but would really like to see the unmapped reads. Could you please suggest how to accomplish this? Thank you.

0
Entering edit mode

That's a completely different question than the one you started with. A separate thread would be appropriate. Don't forget to be as informative as possible and include all necessary information in your post.