How to extract gnomad exomes bgz
3
2
Entering edit mode
4.4 years ago
stanedav ▴ 50

Hello guys, I need help with processing gnomad variant data.

But I need to uncompress this bgz file into regular vcf - I need to process it locally variant by variant. Is there any way how to do it?

I tried to decompress the file with bgzip but I had this error:

bgzip --decompress gnomad.exomes.r2.0.2.sites.vcf.bgz
[bgzip] gnomad.exomes.r2.0.2.sites.vcf.bgz.tbi: unknown suffix -- ignored


Is there any way how to get plain vcf from bgz file?

0
Entering edit mode

Hello,

what kind of analysis do you like to perform?

Normaly there is no need to unzip. Most existing software for anslyising vcf can handle compressed file. If you write your own one, there is a good chance that there is a module which can handle it.

One more thing you can do is to use zcat. This prints the content of a compressed file directly to stdout and you can read it in on the fly to your program.

fin swimmer

7
Entering edit mode
4.4 years ago
mv gnomad.exomes.r2.0.2.sites.vcf.bgz gnomad.exomes.r2.0.2.sites.vcf.gz

0
Entering edit mode

This does work but I'm wondering why they are given the name ending in .vcf.bgz to begin with. We shouldn't have to rename the file in order to decompress it. 'bgz' isn't a recognizable file extension and gzip/gunzip do not allow that. If it is a gzipped vcf it should really be called .vcf.gz. tar.gz is a special case where it can be shortened to 'tgz'.

0
Entering edit mode
3.7 years ago
Shicheng Guo ★ 9.2k

They are compressed bcf format. try the following one:

bcftools view gnomad.exomes.r2.0.2.sites.vcf.bgz -Ov -o gnomad.exomes.r2.0.2.sites.vcf

0
Entering edit mode
2.9 years ago
janapet • 0

0
Entering edit mode

This is virtually the same solution recommended by Pierre 18 months ago. Is there a reason this should be a separate answer?