Question: How can I predict the time of calculations for RNAseq?
gravatar for agata88
2.0 years ago by
agata88790 wrote:

Hi all!

I am going to have PE reads for human RNAseq (around 70 millions of reads). How can I predict whether my computer have enough disc space and memory to run mapping reads to reference genome with the use of TopHat or any other RNAseq mapping algorithm?

I would like to decide whether I need to use cloud for this calculation or I can make it on my local computer.

I have 1T disc space and 64GB or RAM, 10 cores.

Thank you in advance,


rna-seq • 601 views
ADD COMMENTlink modified 22 months ago by arup1.9k • written 2.0 years ago by agata88790

You should know that the old 'Tuxedo' pipeline of Tophat(2) and Cufflinks is no longer the "advisable" tool for RNA-seq analysis. The software is deprecated/ in low maintenance and should be replaced by HISAT2, StringTie and ballgown. See this paper: Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. (If you can't get access to that publication, let me know and I'll -cough- help you.) There are also other alternatives, including alignment with STAR and bbmap, or pseudo-alignment using salmon.

ADD REPLYlink written 2.0 years ago by WouterDeCoster42k

Woow, I am surprised. Last year I was performing RNAseq analysis withe the use of TopHat and Cufflinks and the results was fine. I was going to repeat that pipeline this year. Thanks for letting me know. I will go with other solutions.

ADD REPLYlink written 2.0 years ago by agata88790

If you are flexible on time then it should work with the specs posted above. How many of these 70M read samples do you expect to do?

I suggest that you use BBMap. It requires about 30G of RAM for human genome. STAR would need about the same. You can find the time a million reads take by adding reads=1000000 parameter to bbmap command line and can then extrapolate from there.

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by genomax75k

I have 12 samples. Thanks for tips. Although I am flexible with time I would like to do it wisely.

ADD REPLYlink written 2.0 years ago by agata88790
gravatar for arup
22 months ago by
arup1.9k wrote:

I guess the processor you are using Intel Xeon processor, so the number of threads will be 10*2=20. If you use Tophat2 with ~8 threads per process it will take >=4 hrs per sample. The complete analysis will take around a day using Tuxedo2. But HISAT2 is a huge improvement, it took ~30 mins for mouse RNA-seq PE data with 30mil reads.

Get BAM file as output from alignment to save a lot of space. As you mentioned you have 12 samples the maximum space required for analysis will be within 200GB.

ADD COMMENTlink written 22 months ago by arup1.9k

Thanks a lot! Best, Agata

ADD REPLYlink written 22 months ago by agata88790
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1181 users visited in the last hour