Calculate sequencing file size?
0
0
Entering edit mode
15 months ago
Lluís R. ★ 1.2k

We are writing the data management plan and I need to estimate the size of the data we'll get (for storage costs) per sample (as a fastq.gz).

I found a very rough estimation in this comment which is what I know from previous experience, but I wonder if there are more accurate calculators available?

Some variables that I think are important:

  • Read length
  • Paired
  • Targeted number of reads
  • Adaptors
  • UMI
  • Technology (RRBS, WGS, RNAseq, 16S, metagenomics, metaproteomics)
storage data sequencing plan management • 970 views
ADD COMMENT
1
Entering edit mode

This is not something you can be absolute about unless you have a large amount of data to refer back to. The sizes are going to vary depending on library quality/sample type/compress-ability etc.

If you must come up with a number then look at some representative files and add 10% to cover outliers.

Remind the authorities that the more successful you are the more you are going to need (storage that is).

ADD REPLY
0
Entering edit mode

Oh I absolutely agree, I don't need a hard estimate (between 1 and 3 GB) but it would be nice to have some numbers to more or less know if it is better to pay cloud storage or just have local copies.

ADD REPLY
1
Entering edit mode

If you don't have data of your own you could use, then randomly get examples from SRA and go from there. You could also use read simulators and create dummy datasets with different characteristics that you control.

ADD REPLY
0
Entering edit mode

Good ideas! I'll do that. Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 2136 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6