Question

A question of NGS requirement

0

Entering edit mode

16 hours ago

zizigolu ★ 4.4k

Hi

Please, could somebody give me a hand about some infrustructure information?

storage RNAseq • 258 views

ADD COMMENT • link updated 12 hours ago by GenoMax 154k • written 16 hours ago by zizigolu ★ 4.4k

0

Entering edit mode

16 hours ago

colindaven 8.1k

Depends how much you do with them, how good you are at being disciplined with storage, what size of file are initially generated, whether you want to keep all the aligned bams, how many reference genomes you use, if you do de novo assembly too, etc etc etc.

Constructive points

pigz - parallel gzip - is your friend
program in nextflow pipelines and delete tmp files in the work dir all the fime
use a tool like ncdu or dust to monitor disk usage very frequently
be disciplined
check ncbi etc for the range of file sizes per sample you might generate. Multiply by 250.

Consider also offline storage as backup, eg amazon glacier, even multiple copies on local external hard disks can be very cheap.

ADD COMMENT • link 16 hours ago by colindaven 8.1k

0

Entering edit mode

16 hours ago

Arup Ghosh 3.5k

It depends on the sequencing depth, transcriptome size, and how many intermediate files are stored/created during analysis.

Example:

Mycobacterium tuberculosis; RNA-Seq (SRR7444071)

Genome size: 4.5Mb

Number of reads: 2,393,077

File size: 129.1MB

A conservative estimate will be ~700GB to 1 TB at max, assuming none of the files are stored in an uncompressed format.

ADD COMMENT • link 16 hours ago by Arup Ghosh 3.5k

score 2 · Accepted Answer · 2025-11-07

2

Entering edit mode

16 hours ago

GenoMax 154k

It surprisingly may not be that large in terms of size for prokaryotic samples.

Here is a table of storage size estimates from GPT for different types of prokaryptic transcriptomes for one sample (Typically 50–150 bp paired-end (e.g., 2 × 75 bp or 2 × 100 bp).

| Data type                             | Approx. reads        | Approx. file size (FASTQ, gzip-compressed) | Notes                                                                      |
| ------------------------------------- | -------------------- | ------------------------------------------ | -------------------------------------------------------------------------- |
| **Basic differential expression**     | 5–10 million reads   | ~0.5–2 GB                                  | Sufficient for most bacteria with small genomes (~3–6 Mb).                 |
| **High-depth transcriptome**          | 20–50 million reads  | ~2–10 GB                                   | Used for lowly expressed genes, isoform detection, or complex communities. |
| **Metatranscriptome / mixed culture** | 50–150 million reads | ~10–30 GB                                  | Needed to capture transcripts from many species.

You can multiply that by the number of samples. You can double the storage to account for all downstream analysis files to manage everything comfortably.

ADD COMMENT • link 16 hours ago by GenoMax 154k

0

Entering edit mode

Thanks all

ADD REPLY • link 13 hours ago by zizigolu ★ 4.4k

1

Entering edit mode

What kind of sequencing are you thinking of doing? Direct RNA or normal cDNA?

Assuming cDNA you may only need 10 cells (with barcoding and very deep sequencing) so roughly a total of ~ 2TB of sequence data. Less with lesser coverage.

| Sequencing depth per sample | 50 Gb flowcell (moderate) | 100 Gb flowcell (good) | 200 Gb flowcell (excellent) |
| --------------------------- | -------------- | --------------- | --------------- |
| 0.75 Gb (basic RNA-seq)     | ~65 samples    | ~130 samples    | ~260 samples    |
| 3 Gb (deep coverage)        | ~16 samples    | ~33 samples     | ~66 samples     |
| 7.5 Gb (very deep)          | ~6 samples     | ~13 samples     | ~26 samples     |

Direct RNA yields would be significantly smaller and barcoding support is lacking so you can only do apparently 4-5 samples per cell.

With polycistronic transctipts analysis may be trickier if you choose full length direct RNA sequencing.

ADD REPLY • link 15 hours ago by GenoMax 154k

0

Entering edit mode

so If we also keep the raw signal files (FAST5/POD5) for possible re-basecalling later, the 60–70 TB over the full project makes sense or exaggerative?

ADD REPLY • link 14 hours ago by zizigolu ★ 4.4k

1

Entering edit mode

One can never have enough storage but that said the PromethION cells were more in the ~ 1.5 TB size in my experience. So the amount you mention should be significantly more than strictly needed.

ADD REPLY • link 12 hours ago by GenoMax 154k