bedops vcf2bed core dump
1
0
Entering edit mode
6.4 years ago
Ram 43k

Hello again,

I'm extracting BED from a 77G VCF file using bedops vcf2bed, and it does a bunch of core dumps. This is because I'm using the following on a compute node to extract the bed:

PATH=/path/to/bedops/2.4.29/bin/:$PATH
switch-BEDOPS-binary-type --megarow
cat 77G_vcf.vcf | parallel --pipe vcf2bed --do-not-sort --snvs >snvs.bed

The multiple core dumps happen thanks to the parallel --pipe, I guess. When I ran this on a login node without the parallel --pipe and with an &, I see the process running, but a jobs shows a core dump happening as well. The BED file grows and looks fine, but the core dump happens nonetheless.

Am I missing something? This worked not 2 days ago.

bedops vcf2bed • 3.8k views
ADD COMMENT
0
Entering edit mode

I have not used parallel with vcf2bed before, so I'm unsure what it is doing to parallelize the work. What happens if you do not use parallel? Is vcf2bed using version 2.4.29 of convert2bed? You could ensure you are running the correct and desired binary by replacing vcf2bed with /path/to/2.4.29/convert2bed-megarow --input=vcf --do-not-sort --snvs < in.vcf > out.bed.

ADD REPLY
0
Entering edit mode

It's on a compute node, and the bedops binaries are not added to the PATH by default. That's why I'm adding the 2.4.29 precompiled binaries explicitly.

ADD REPLY
0
Entering edit mode

I'm just trying to figure out a way to isolate the problem to as few variables as possible. Is it possible to run /path/to/bedops/2.4.29/convert2bed-megarow --input=vcf --do-not-sort --snvs < in.vcf > out.bed directly on the compute node, without using parallel or specifying PATH?

ADD REPLY
0
Entering edit mode

OK, I'll try that now.

ADD REPLY
0
Entering edit mode

Still a segmentation fault. STDERR reads:

/tmp/1510692579.90521360.shell: line 22:  5909 Segmentation fault      (core dumped) /hpc/users/ram/utils/bedops/2.4.29/bin/convert2bed-megarow --input=vcf --do-not-sort --snvs < data.singletons.vcf > data_singletons.snvs.unsorted.bed
/tmp/1510692579.90521360.shell: line 23:  5916 Segmentation fault      (core dumped) /hpc/users/ram/utils/bedops/2.4.29/bin/convert2bed-megarow --input=vcf --do-not-sort --insertions < data.singletons.vcf > data_singletons.insertions.unsorted.bed
/tmp/1510692579.90521360.shell: line 24:  5921 Segmentation fault      (core dumped) /hpc/users/ram/utils/bedops/2.4.29/bin/convert2bed-megarow --input=vcf --do-not-sort --deletions < data.singletons.vcf > data_singletons.deletions.unsorted.bed
/tmp/1510692579.90521360.shell: line 27:  5927 Segmentation fault      (core dumped) /hpc/users/ram/utils/bedops/2.4.29/bin/convert2bed-megarow --input=vcf --snvs --do-not-sort < data.norm.vcf > data.snvs.unsorted.bed
/tmp/1510692579.90521360.shell: line 28:  5932 Segmentation fault      (core dumped) /hpc/users/ram/utils/bedops/2.4.29/bin/convert2bed-megarow --input=vcf --insertions --do-not-sort < data.norm.vcf > data.insertions.unsorted.bed
/tmp/1510692579.90521360.shell: line 29:  5937 Segmentation fault      (core dumped) /hpc/users/ram/utils/bedops/2.4.29/bin/convert2bed-megarow --input=vcf --deletions --do-not-sort < data.norm.vcf > data.deletions.unsorted.bed

There are a bunch of core.XXXX files ~30M in size and empty BED files.

The config I'm requesting is 1 node (1 CPU) with 16G RAM.

ADD REPLY
0
Entering edit mode

Thanks. How are these binaries installed? Are these downloaded from the Github package, or did you compile these?

ADD REPLY
0
Entering edit mode

These are direct binaries that I downloaded off GitHub. I'm trying another route now - I extracted the vcf.gz files using pigz -dc, now I'm trying using gunzip -c. It probably won't make a difference, but just eliminating pigz-induced corruptions as a factor.

ADD REPLY
0
Entering edit mode

I have compiled binaries as well, I can try them if required.

ADD REPLY
2
Entering edit mode
6.4 years ago

Self-compiled binaries would be a useful test, as it would remove differences in Linux setup as a potential source of problems.

If you can build and run self-compiled binaries, please use make all to build them, so that you get a megarow build of convert2bed. If you instead use make, please be sure to edit the exponent constants so that the binaries support longer VCF records.

Also, if you can, ideally, if you can write the first 1M lines of your VCF to an archive and post it somewhere where I could download it, I could try converting it on my end and see what's going on inside convert2bed during conversion, to see what might be causing the segfault.

ADD COMMENT
1
Entering edit mode

That worked! (for the 27G file at least). Let's see if it runs to completion for the 77G file! :-)

ADD REPLY
0
Entering edit mode

If this works for your 77G file, I would be curious to know some details about your Linux setup (what your HPC runs, whatever comes out of ldd --version on a compute node, etc.). Some of those details could help improve how I make Github packages more compatible with what people are running.

ADD REPLY
1
Entering edit mode

Sure thing, I'll check tomorrow and let you know. We recently had a firmware + software upgrade, so I can even ask our HPC team if they could help with some input.

Maybe https://hpc.mssm.edu/systems/minerva/hardware (or anything under https://hpc.mssm.edu in general) would help in the meantime?

ADD REPLY
1
Entering edit mode

It ran to completion without any problems. DM me on Twitter with any questions you have!

➜ ldd --version
ldd (GNU libc) 2.12
Copyright (C) 2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Roland McGrath and Ulrich Drepper.
ADD REPLY
1
Entering edit mode

That's an older version of glibc than I am compiling the precompiled binaries with. As glibc is not forwards-compatible, I'd say that's probably why you ran into problems with the precompiled stuff. This is useful info, thanks!

ADD REPLY
0
Entering edit mode

with parallel or wo parallel?

ADD REPLY
0
Entering edit mode

Without. I had to test my build and did not want to add anything to it. Once it started running well, I didn't interrupt it and now it's not using parallel.

ADD REPLY
0
Entering edit mode

thanks for the info :)

ADD REPLY
0
Entering edit mode

I'm sure if I do use parallel --pipe, it'll work fine now. It was the discrepancy between the pre-compiled binary and my cluster that was the cause, not the parallel.

ADD REPLY
0
Entering edit mode

I compiled from source after editing the REST EXPONENT to 17 (as I detailed yesterday). I'll use that and check what happens.

gunzipped input files failed too, BTW :-(

ADD REPLY

Login before adding your answer.

Traffic: 2956 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6