Question: What Is The Expected Size Of A Whole Genome Vcf And Bcf?
3
gravatar for Jeremy Leipzig
6.8 years ago by
Philadelphia, PA
Jeremy Leipzig18k wrote:

Say we take a 40x whole human genome BAM file of HiSeq reads (~100GB), call variants but do not annotate further, and create a VCF with every position called (even if that position matches the reference genome), then compress. How big will the VCF and BCF files be?

vcf • 7.1k views
ADD COMMENTlink modified 6.8 years ago by Rok180 • written 6.8 years ago by Jeremy Leipzig18k

Do I understand this correctly? Every base in reference genome needs to be a line in a VCF file?

ADD REPLYlink written 6.8 years ago by Rok180

yep............

ADD REPLYlink written 6.8 years ago by Jeremy Leipzig18k

VCF file can have more than one patient in it, are you talking a single patient? If more, then it will affect the size.

ADD REPLYlink written 6.5 years ago by user56290

yes that is true. To give some background, this question was asked because we are facing many 'ref or no coverage' mysteries in our trios when samples are called individually. I was wondering if a viable solution is simply to call all positions.

ADD REPLYlink written 6.5 years ago by Jeremy Leipzig18k
4
gravatar for Rok
6.8 years ago by
Rok180
Trondheim, Norway
Rok180 wrote:

Under the assumption that each line will similar to this one:

chr1    249250621    .    A    A    22    PASS        0/0

This means each line uses at max 45 bytes. Times length of human genome this makes VCF file of maximum size around 125GB. Size of the header is not used in the calculation since it's insignificant compared to the rest of the file.

I don't know much about the BCF format and the effects of compression. A wild guess from the would be that the compression will reduce size of the file under 30GB.

ADD COMMENTlink written 6.8 years ago by Rok180
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1672 users visited in the last hour