I'm trying to run
vcf2bed on a huge (27G) VCF file. I'd like to extract a BED of the deletions in the VCF.
vcf2bed --max-mem=4G --deletions <file.vcf >file.bed
I'm running it on a cluster and I tried giving the job 8G and 16G RAM, but it maxes out all available RAM in a matter of seconds (100-150 seconds) and the job quits. I know splitting the VCF into per-chromosome chunks might solve this, but is there any other option I could use?
I am open to using other tools to extract the BED as well, given that they ensure
I combined a couple of ideas from the recommendations below to get to my solution - YMMV and not all of these mods might be necessary. I piped in the file using GNU parallel's
--pipe and also used
--do-not-sort to get unsorted bed files.
cat huge_file.vcf | parallel --pipe vcf2bed --do-not-sort >huge_file.unsorted.bed