Question

vcf2bed giving empty bed file as output

0

Entering edit mode

6.3 years ago

michael.nagle ▴ 100

I'm trying to use vcf2bed to convert a ~0.5TB .vcf file to .bed and am unable to figure out why this command isn't working. I get a blank file as the output. The job completes in a second and gives me nothing but a blank file. Please let me know what might be the problem here.

vcf2bed < /path/input.vcf > /path/output.bed

Also, it would be nice if anybody could give me an estimate on how large the output .bed file will be when working with a 0.5TB input.

To be sure the format of the input is correct, Here's part of the file: Image of several dozen rows of .vcf file after metadata

Is my .vcf input in the right format? It appears to be the same as what is required in the BEDOPS manual (https://media.readthedocs.org/pdf/bedops/latest/bedops.pdf)

My input command and .vcf appeaer to be consistent with instructions in the notebook and several forum posts, so I don't know how to get past this. Why is the .bed output 0 bytes?

Thanks for the help.

GWAS genome genomics • 5.0k views

ADD COMMENT • link updated 4.6 years ago by aweaver7204 • 0 • written 6.3 years ago by michael.nagle ▴ 100

0

Entering edit mode

Wild guess: wrong path to input file. What's your exact command line?

ADD REPLY • link 6.3 years ago by jomo018 ▴ 720

0

Entering edit mode

Did this solution work for you because I am having the same issue and this solution did not fix it. Any help would be greatly appreciated.

ADD REPLY • link 4.6 years ago by aweaver7204 • 0

0

Entering edit mode

I didn't figure out the cause of the problem, so I used PLINK to convert the VCF to bed instead of using vcf2bed. It's very easy with PLINK. https://bioinformatics.stackexchange.com/questions/3667/converting-vcf-file-to-plink-bed-bim-fam-files

ADD REPLY • link 4.6 years ago by michael.nagle ▴ 100

score 1 · Answer 1 · 2018-01-07

It is likely that your /tmp folder is filling up with intermediate data during the sorting step. Some /tmp or swap folders are not large enough to hold intermediate results.

Use --sort-tmpdir <dir> with vcf2bed to specify an alternative directory <dir> that can contain more than 500 GB of data (a worst-case scenario, where all variants are on one chromosome).

Alternatively, use --do-not-sort with vcf2bed to keep the result unsorted, and then sort afterwards with sort-bed --tmpdir <dir>, which accomplishes the same result.

If the BED file is too large, you can use vcf2starch to create a Starch archive from the BED file. This will be about twice as efficient as compression with gzip. The BEDOPS documentation describes Starch files and the format in more detail. BEDOPS tools work natively with Starch as well as BED.