Question: How can I predict the time of calculations for RNAseq?
0
gravatar for agata88
15 months ago by
agata88770
Poland
agata88770 wrote:

Hi all!

I am going to have PE reads for human RNAseq (around 70 millions of reads). How can I predict whether my computer have enough disc space and memory to run mapping reads to reference genome with the use of TopHat or any other RNAseq mapping algorithm?

I would like to decide whether I need to use cloud for this calculation or I can make it on my local computer.

I have 1T disc space and 64GB or RAM, 10 cores.

Thank you in advance,

Agata

rna-seq • 486 views
ADD COMMENTlink modified 13 months ago by arup870 • written 15 months ago by agata88770
3

You should know that the old 'Tuxedo' pipeline of Tophat(2) and Cufflinks is no longer the "advisable" tool for RNA-seq analysis. The software is deprecated/ in low maintenance and should be replaced by HISAT2, StringTie and ballgown. See this paper: Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. (If you can't get access to that publication, let me know and I'll -cough- help you.) There are also other alternatives, including alignment with STAR and bbmap, or pseudo-alignment using salmon.

ADD REPLYlink written 15 months ago by WouterDeCoster37k

Woow, I am surprised. Last year I was performing RNAseq analysis withe the use of TopHat and Cufflinks and the results was fine. I was going to repeat that pipeline this year. Thanks for letting me know. I will go with other solutions.

ADD REPLYlink written 15 months ago by agata88770
1

If you are flexible on time then it should work with the specs posted above. How many of these 70M read samples do you expect to do?

I suggest that you use BBMap. It requires about 30G of RAM for human genome. STAR would need about the same. You can find the time a million reads take by adding reads=1000000 parameter to bbmap command line and can then extrapolate from there.

ADD REPLYlink modified 15 months ago • written 15 months ago by genomax64k

I have 12 samples. Thanks for tips. Although I am flexible with time I would like to do it wisely.

ADD REPLYlink written 15 months ago by agata88770
1
gravatar for arup
13 months ago by
arup870
India
arup870 wrote:

I guess the processor you are using Intel Xeon processor, so the number of threads will be 10*2=20. If you use Tophat2 with ~8 threads per process it will take >=4 hrs per sample. The complete analysis will take around a day using Tuxedo2. But HISAT2 is a huge improvement, it took ~30 mins for mouse RNA-seq PE data with 30mil reads.

Get BAM file as output from alignment to save a lot of space. As you mentioned you have 12 samples the maximum space required for analysis will be within 200GB.

ADD COMMENTlink written 13 months ago by arup870

Thanks a lot! Best, Agata

ADD REPLYlink written 13 months ago by agata88770
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 890 users visited in the last hour