Should Making Bgzip Files Take A Long Time?
1
0
Entering edit mode
10.5 years ago

I'm bgzipping some huge vcf files on a server and wondering whether I'm doing something wrong as it is taking a long time (file size increasing with a 100kb every few seconds.)

The command I'm running is

tabix -h vcfs/ALL.chrX.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz X | vcf-subset -c population_files/CEU_panel | bgzip > population_files/CEU_X.vcf.gz

head /proc/meminfo gives:

MemTotal:       140567216 kB
MemFree:         2109700 kB
Buffers:          295408 kB
Cached:         106374448 kB
SwapCached:       991620 kB
Active:         62698940 kB
Inactive:       71434832 kB
Active(anon):   55538428 kB
Inactive(anon): 35691912 kB
Active(file):    7160512 kB

(Dunno if this is relevant)

Thanks.

• 3.1k views
ADD COMMENT
2
Entering edit mode
10.5 years ago

You're not just bgzip-ing the file, but extracting, then subsetting, and then compressing them back to a file. You could be CPU-limited. If you run top and the vcf-subset command is running at 100% utilization, then that's your bottleneck. Depending on how little of the data you are subsetting, your output file might not grow very quickly either.

ADD COMMENT

Login before adding your answer.

Traffic: 1892 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6