Question: how to optimize Tophat for targeted RNA data analysis
0
gravatar for yelekley7
5 weeks ago by
yelekley70
yelekley70 wrote:

Hello,

Could someone please suggest optimum TopHat parameters to analyze Ion Torrent targeted RNA data. I need to identify the breakpoints and also unaligned reads in IGV. I plan to run TopHat through Galaxy to test it out, before installing it on the server.

Any suggestions would be greatly appreciated.

Thanks

rna-seq • 218 views
ADD COMMENTlink modified 5 weeks ago by Kevin Blighe6.7k • written 5 weeks ago by yelekley70
1

Please stop using TopHat. Even more so with Ion Torrent data.

Quote from TopHat web site:

Please note that TopHat has entered a low maintenance, low support stage as it is now largely superseded by HISAT2 which provides the same core functionality (i.e. spliced alignment of RNA-Seq reads), in a more accurate and much more efficient way.

There are much better solutions out there (HISAT2, STAR and many other splice-aware aligners).

ADD REPLYlink written 5 weeks ago by genomax37k

Could you please suggest parameters to use to optimize alignment?

ADD REPLYlink written 5 weeks ago by yelekley70

Hi, why don't you take a look at these:

If you must use TopHat out of curiosity, then just provide it with good data, i.e., reads >50bp and that have base-qualities >30 at the red ends. Start with the quality threshold bars high and then tailour back if needed.

As genomax mentioned, TopHat is effectively retired, and it has been replaced by HiSAT.

ADD REPLYlink written 5 weeks ago by Kevin Blighe6.7k

Hi Kevin,

Sorry for a basic question but I just ran my fastQ file through HISAT using default parameters and here are the stats:

345063 reads; of these:
345063 (100.00%) were unpaired; of these:
35597 (10.32%) aligned 0 times
260962 (75.63%) aligned exactly 1 time
48504 (14.06%) aligned >1 times
89.68% overall alignment rate

It produced a .bam (19,219 K) and an index .bai. I've imported the file into IGV using the link but I don't see anything. Where are the reads? Thanks for your help.

ADD REPLYlink modified 5 weeks ago by genomax37k • written 5 weeks ago by yelekley70
1

You have to zoom in significantly before you start seeing the reads. You have a very small amount of reads (for an RNAseq dataset) so you can either move around the genome in IGV till you find the reads (or pick a gene you know should be represented) and then go to that region directly.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by genomax37k

Thank you so much, I see them now!. Is there a way to optimize the alignment? Thanks

ADD REPLYlink written 5 weeks ago by yelekley70

What does optimize mean? How could it be improved?

ADD REPLYlink written 4 weeks ago by WouterDeCoster23k

Last, I used RNA STAR to align the fastq and the output is looking good. I was able to create an aligned bam file and I can see soft clip bases in IGV. However, how do I see the full length fusion reads that were not mapped? I can see an evidence of fusion but would really like to see the unmapped reads. Could you please suggest how to accomplish this? Thank you.

ADD REPLYlink written 4 weeks ago by yelekley70

That's a completely different question than the one you started with. A separate thread would be appropriate. Don't forget to be as informative as possible and include all necessary information in your post.

ADD REPLYlink written 4 weeks ago by WouterDeCoster23k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1665 users visited in the last hour