A question of NGS requirement
3
0
Entering edit mode
16 hours ago
zizigolu ★ 4.4k

Hi

Please, could somebody give me a hand about some infrustructure information?

storage RNAseq • 258 views
ADD COMMENT
2
Entering edit mode
16 hours ago
GenoMax 154k

It surprisingly may not be that large in terms of size for prokaryotic samples.

Here is a table of storage size estimates from GPT for different types of prokaryptic transcriptomes for one sample (Typically 50–150 bp paired-end (e.g., 2 × 75 bp or 2 × 100 bp).

| Data type                             | Approx. reads        | Approx. file size (FASTQ, gzip-compressed) | Notes                                                                      |
| ------------------------------------- | -------------------- | ------------------------------------------ | -------------------------------------------------------------------------- |
| **Basic differential expression**     | 5–10 million reads   | ~0.5–2 GB                                  | Sufficient for most bacteria with small genomes (~3–6 Mb).                 |
| **High-depth transcriptome**          | 20–50 million reads  | ~2–10 GB                                   | Used for lowly expressed genes, isoform detection, or complex communities. |
| **Metatranscriptome / mixed culture** | 50–150 million reads | ~10–30 GB                                  | Needed to capture transcripts from many species.    

You can multiply that by the number of samples. You can double the storage to account for all downstream analysis files to manage everything comfortably.

ADD COMMENT
0
Entering edit mode

Thanks all

ADD REPLY
1
Entering edit mode

What kind of sequencing are you thinking of doing? Direct RNA or normal cDNA?

Assuming cDNA you may only need 10 cells (with barcoding and very deep sequencing) so roughly a total of ~ 2TB of sequence data. Less with lesser coverage.

| Sequencing depth per sample | 50 Gb flowcell (moderate) | 100 Gb flowcell (good) | 200 Gb flowcell (excellent) |
| --------------------------- | -------------- | --------------- | --------------- |
| 0.75 Gb (basic RNA-seq)     | ~65 samples    | ~130 samples    | ~260 samples    |
| 3 Gb (deep coverage)        | ~16 samples    | ~33 samples     | ~66 samples     |
| 7.5 Gb (very deep)          | ~6 samples     | ~13 samples     | ~26 samples     |

Direct RNA yields would be significantly smaller and barcoding support is lacking so you can only do apparently 4-5 samples per cell.

With polycistronic transctipts analysis may be trickier if you choose full length direct RNA sequencing.

ADD REPLY
0
Entering edit mode

so If we also keep the raw signal files (FAST5/POD5) for possible re-basecalling later, the 60–70 TB over the full project makes sense or exaggerative?

ADD REPLY
1
Entering edit mode

One can never have enough storage but that said the PromethION cells were more in the ~ 1.5 TB size in my experience. So the amount you mention should be significantly more than strictly needed.

ADD REPLY
0
Entering edit mode
16 hours ago

Depends how much you do with them, how good you are at being disciplined with storage, what size of file are initially generated, whether you want to keep all the aligned bams, how many reference genomes you use, if you do de novo assembly too, etc etc etc.

Constructive points

  • pigz - parallel gzip - is your friend
  • program in nextflow pipelines and delete tmp files in the work dir all the fime
  • use a tool like ncdu or dust to monitor disk usage very frequently
  • be disciplined
  • check ncbi etc for the range of file sizes per sample you might generate. Multiply by 250.

Consider also offline storage as backup, eg amazon glacier, even multiple copies on local external hard disks can be very cheap.

ADD COMMENT
0
Entering edit mode
16 hours ago

It depends on the sequencing depth, transcriptome size, and how many intermediate files are stored/created during analysis.

Example:

Mycobacterium tuberculosis; RNA-Seq (SRR7444071)

Genome size: 4.5Mb

Number of reads: 2,393,077

File size: 129.1MB

A conservative estimate will be ~700GB to 1 TB at max, assuming none of the files are stored in an uncompressed format.

ADD COMMENT

Login before adding your answer.

Traffic: 4552 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6