Question: Size of typical genomic data
1
gravatar for Nicolas Rosewick
11 months ago by
Belgium, Brussels
Nicolas Rosewick7.5k wrote:

Hi,

I'm preparing some slides and would like to have some uptodate information related to typical sizes of NGS applications (VCF and BAM) e.g. exome ; WGS ; RNA-Seq ; gene panels , etc...

Looking in the litterature that's what I found (for 30x coverage and 2x100bp read length)

Type of NGS                   VCF        BAM
Gene panels (50 genes)       1 MB    ~100 MB
Gene panels (500 genes)     10 MB      ~1 GB
Whole Exome                   1Gb      ~5 GB
Whole Genome               125 GB    ~100 GB

Any input ?

Thanks

genomic size • 1.5k views
ADD COMMENTlink modified 11 months ago by toralmanvar750 • written 11 months ago by Nicolas Rosewick7.5k

VCF only contains variants, as bam files contains all reads alignments, VCF are therefore much smaller than bams. Even for whole genomes you don't expect to have a 125Gb VCF for a 100Gb bam.

More generally, the size of the NGS files are correlated with the size of the species genome, and with the depth of sequencing.

ADD REPLYlink modified 11 months ago • written 11 months ago by guillaume.rbt540

I took this information from A: What Is The Expected Size Of A Whole Genome Vcf And Bcf?

ADD REPLYlink written 11 months ago by Nicolas Rosewick7.5k

I guess the calcul assumes that there is a variant at every position of the human genomes, which never happens.

ADD REPLYlink written 11 months ago by guillaume.rbt540

It is tough to generalize the file size for any of the mentioned analysis. And yes, guillaume.rbt is right, size of vcf for human genome can never reach upto 125 Gb for human.

ADD REPLYlink written 11 months ago by toralmanvar750

It is quite difficult to put numbers on this. To give an idea, though, when I was working in a clinical genetics testing laboratory in the UK National Health Service, we had throughput of around 16 samples per week and panels looking at ~25 genes. In 18 months of NGS testing, we accumulated 6.2TB of data, 1.2TB of which was just the results files (BAM, VCF, text-based reports).

My laboratory manager and I published a report in 'StorageNewsletter' but they now appear to require as subscription even to read it. I have it as PDF if you want it.

ADD REPLYlink modified 28 days ago • written 11 months ago by Kevin Blighe41k

I would like to have a look at this report, if you could send it please? Thank you

ADD REPLYlink written 28 days ago by anamaria30

Sorry, the URL ('link') in my message was erroneous (I have fixed it). However, and also, the magazine now requires a subscription. I will upload the report to another location when I get home. Please reply again here to ping me.

ADD REPLYlink written 28 days ago by Kevin Blighe41k

Hey, I have added the files here: https://github.com/kevinblighe/BiostarsMisc

  • Blighe2016(NGSStorageReport).pdf
  • NGSDataStorageReviewv3.doc
ADD REPLYlink written 28 days ago by Kevin Blighe41k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1155 users visited in the last hour