Question: Should Making Bgzip Files Take A Long Time?
0
gravatar for Click downvote
7.2 years ago by
Germany
Click downvote690 wrote:

I'm bgzipping some huge vcf files on a server and wondering whether I'm doing something wrong as it is taking a long time (file size increasing with a 100kb every few seconds.)

The command I'm running is

tabix -h vcfs/ALL.chrX.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz X | vcf-subset -c population_files/CEU_panel | bgzip > population_files/CEU_X.vcf.gz

head /proc/meminfo gives:

MemTotal:       140567216 kB
MemFree:         2109700 kB
Buffers:          295408 kB
Cached:         106374448 kB
SwapCached:       991620 kB
Active:         62698940 kB
Inactive:       71434832 kB
Active(anon):   55538428 kB
Inactive(anon): 35691912 kB
Active(file):    7160512 kB

(Dunno if this is relevant)

Thanks.

• 2.1k views
ADD COMMENTlink modified 7.1 years ago by Matt Shirley9.4k • written 7.2 years ago by Click downvote690
2
gravatar for Matt Shirley
7.1 years ago by
Matt Shirley9.4k
Cambridge, MA
Matt Shirley9.4k wrote:

You're not just bgzip-ing the file, but extracting, then subsetting, and then compressing them back to a file. You could be CPU-limited. If you run top and the vcf-subset command is running at 100% utilization, then that's your bottleneck. Depending on how little of the data you are subsetting, your output file might not grow very quickly either.

ADD COMMENTlink written 7.1 years ago by Matt Shirley9.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1493 users visited in the last hour