Question: What Is The Expected Size Of A Whole Genome Vcf And Bcf?
gravatar for Jeremy Leipzig
8.1 years ago by
Philadelphia, PA
Jeremy Leipzig19k wrote:

Say we take a 40x whole human genome BAM file of HiSeq reads (~100GB), call variants but do not annotate further, and create a VCF with every position called (even if that position matches the reference genome), then compress. How big will the VCF and BCF files be?

vcf • 8.4k views
ADD COMMENTlink modified 8.1 years ago by Rok190 • written 8.1 years ago by Jeremy Leipzig19k

Do I understand this correctly? Every base in reference genome needs to be a line in a VCF file?

ADD REPLYlink written 8.0 years ago by Rok190


ADD REPLYlink written 8.0 years ago by Jeremy Leipzig19k

VCF file can have more than one patient in it, are you talking a single patient? If more, then it will affect the size.

ADD REPLYlink written 7.8 years ago by user56290

yes that is true. To give some background, this question was asked because we are facing many 'ref or no coverage' mysteries in our trios when samples are called individually. I was wondering if a viable solution is simply to call all positions.

ADD REPLYlink written 7.8 years ago by Jeremy Leipzig19k
gravatar for Rok
8.0 years ago by
Trondheim, Norway
Rok190 wrote:

Under the assumption that each line will similar to this one:

chr1    249250621    .    A    A    22    PASS        0/0

This means each line uses at max 45 bytes. Times length of human genome this makes VCF file of maximum size around 125GB. Size of the header is not used in the calculation since it's insignificant compared to the rest of the file.

I don't know much about the BCF format and the effects of compression. A wild guess from the would be that the compression will reduce size of the file under 30GB.

ADD COMMENTlink written 8.0 years ago by Rok190
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1132 users visited in the last hour