How can I predict the time of calculations for RNAseq?
1
0
Entering edit mode
6.4 years ago
agata88 ▴ 870

Hi all!

I am going to have PE reads for human RNAseq (around 70 millions of reads). How can I predict whether my computer have enough disc space and memory to run mapping reads to reference genome with the use of TopHat or any other RNAseq mapping algorithm?

I would like to decide whether I need to use cloud for this calculation or I can make it on my local computer.

I have 1T disc space and 64GB or RAM, 10 cores.

Thank you in advance,

Agata

RNA-Seq • 1.3k views
ADD COMMENT
3
Entering edit mode

You should know that the old 'Tuxedo' pipeline of Tophat(2) and Cufflinks is no longer the "advisable" tool for RNA-seq analysis. The software is deprecated/ in low maintenance and should be replaced by HISAT2, StringTie and ballgown. See this paper: Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. (If you can't get access to that publication, let me know and I'll -cough- help you.) There are also other alternatives, including alignment with STAR and bbmap, or pseudo-alignment using salmon.

ADD REPLY
0
Entering edit mode

Woow, I am surprised. Last year I was performing RNAseq analysis withe the use of TopHat and Cufflinks and the results was fine. I was going to repeat that pipeline this year. Thanks for letting me know. I will go with other solutions.

ADD REPLY
1
Entering edit mode

If you are flexible on time then it should work with the specs posted above. How many of these 70M read samples do you expect to do?

I suggest that you use BBMap. It requires about 30G of RAM for human genome. STAR would need about the same. You can find the time a million reads take by adding reads=1000000 parameter to bbmap command line and can then extrapolate from there.

ADD REPLY
0
Entering edit mode

I have 12 samples. Thanks for tips. Although I am flexible with time I would like to do it wisely.

ADD REPLY
1
Entering edit mode
6.3 years ago

I guess the processor you are using Intel Xeon processor, so the number of threads will be 10*2=20. If you use Tophat2 with ~8 threads per process it will take >=4 hrs per sample. The complete analysis will take around a day using Tuxedo2. But HISAT2 is a huge improvement, it took ~30 mins for mouse RNA-seq PE data with 30mil reads.

Get BAM file as output from alignment to save a lot of space. As you mentioned you have 12 samples the maximum space required for analysis will be within 200GB.

ADD COMMENT
0
Entering edit mode

Thanks a lot! Best, Agata

ADD REPLY

Login before adding your answer.

Traffic: 1482 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6