RNA-seq analysis cloud server
1
0
Entering edit mode
2.3 years ago
kimmitzka • 0

Hi all,

I have some RNA-seq of mice (around 200GB) and I want to perform a RNA-seq analysis (including QC, mapping, quantification, differential expression analysis). But I don't know how to choose a server. Could anyone can tell me to process such a dataset, how much CPU space, GPU, thread and memory space should be appropriate. (time-consuming and expenditure factors).

I appreciate your help.

RNA-seq • 2.0k views
ADD COMMENT
1
Entering edit mode

Honestly, I routinely do RNA-seq analysis (with all those things you mentioned) on my laptop (a 2018 Macbook pro). Why do you need a server?

ADD REPLY
0
Entering edit mode

How long will you take

ADD REPLY
0
Entering edit mode

How many samples? But yeah, as said on the other comment, just get any computer running and process your samples. RNA-seq quantification is trivial for end users these days. I prefer salmon but the mentioned kallisto is also fine, both are super fast and memory-efficient.

ADD REPLY
0
Entering edit mode

Cause my lab want me to use a server but I don't know which to choose.

ADD REPLY
1
Entering edit mode

It really doesn't matter. If I can process dozens of samples on a Macbook pro, any server (or laptop) should suit your purpose. Modern tools (e.g. kallisto) can process your samples super fast with very little memory requirements.

I processed 200,000,000 reads in less than an hour on Google Colab once.

In the time that you waste trying to figure out what server to buy, you could have already finished your RNAseq analysis on your laptop.

ADD REPLY
1
Entering edit mode
2.3 years ago
GenoMax 141k

To give you an actual idea of configuration you will minimally need to do this in cloud.

A. If you choose to use mapping-based method like salmon (or kallisto) which uses transcriptome sequence

For salmon (probably similar for kalisto) There are two ways to do the alignment. One to just transcriptome. For that will need ~4 GB of RAM for each sample. It is generally recommended that you include genome-decoys so that bumps the memory requirement up to ~20G RAM for human/mouse genomes. This is for 1 sample. If you want to run multiple samples in parallel then you will need to multiple this requirement by number of samples you want to run in parallel.

B. Using an aligner like STAR or bbmap with genome sequence.

You will need about 40G of RAM to do create genome indexes/do alignments. This requirement is for one sample. (note: subread aligner can work in ~8 G of RAM but that may be the only splice-aware aligner that can).

For either method you will need cloud disk storage. 200G will be taken up by your data plus some space for programs you need. You will need space for temporary files/genome indexes/output results. Figure on having at least one TB available. There are charges to move data into and out of the cloud so keep that in mind. You will want to have at least 8 cores available.

If you have never used cloud before then it will take some time to familiarize yourself with everything, so that will add time/cost. There are calculators on AWS/Google that can allow you to estimate costs but use them as a rough guide.

Note: If you have a reasonably new laptop with 8 (preferably 16G) of RAM you may be able to do the analysis (at your pace) locally. That would save the money for cloud expense as others have noted.

ADD COMMENT
0
Entering edit mode

I wouldn't recommend adding genomes in unless you're doing single-cell. Bulk RNA-seq doesn't really have intronic reads.

It shouldn't take 4 GB per sample. It should take 4 GB total (the memory usage comes from loading the index and that's it). Kallisto and salmon already have a multithreaded mode so I'd say just run each sample sequentially (using multithreaded mode for each sample).

ADD REPLY

Login before adding your answer.

Traffic: 1946 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6