Question: bedops vcf2bed core dump
0
gravatar for Ram
8 days ago by
Ram12k
New York
Ram12k wrote:

Hello again,

I'm extracting BED from a 77G VCF file using bedops vcf2bed, and it does a bunch of core dumps. This is because I'm using the following on a compute node to extract the bed:

PATH=/path/to/bedops/2.4.29/bin/:$PATH
switch-BEDOPS-binary-type --megarow
cat 77G_vcf.vcf | parallel --pipe vcf2bed --do-not-sort --snvs >snvs.bed

The multiple core dumps happen thanks to the parallel --pipe, I guess. When I ran this on a login node without the parallel --pipe and with an &, I see the process running, but a jobs shows a core dump happening as well. The BED file grows and looks fine, but the core dump happens nonetheless.

Am I missing something? This worked not 2 days ago.

vcf2bed bedops • 167 views
ADD COMMENTlink written 8 days ago by Ram12k

I have not used parallel with vcf2bed before, so I'm unsure what it is doing to parallelize the work. What happens if you do not use parallel? Is vcf2bed using version 2.4.29 of convert2bed? You could ensure you are running the correct and desired binary by replacing vcf2bed with /path/to/2.4.29/convert2bed-megarow --input=vcf --do-not-sort --snvs < in.vcf > out.bed.

ADD REPLYlink modified 8 days ago • written 8 days ago by Alex Reynolds21k

It's on a compute node, and the bedops binaries are not added to the PATH by default. That's why I'm adding the 2.4.29 precompiled binaries explicitly.

ADD REPLYlink written 8 days ago by Ram12k

I'm just trying to figure out a way to isolate the problem to as few variables as possible. Is it possible to run /path/to/bedops/2.4.29/convert2bed-megarow --input=vcf --do-not-sort --snvs < in.vcf > out.bed directly on the compute node, without using parallel or specifying PATH?

ADD REPLYlink written 8 days ago by Alex Reynolds21k

OK, I'll try that now.

ADD REPLYlink written 8 days ago by Ram12k

Still a segmentation fault. STDERR reads:

/tmp/1510692579.90521360.shell: line 22:  5909 Segmentation fault      (core dumped) /hpc/users/ram/utils/bedops/2.4.29/bin/convert2bed-megarow --input=vcf --do-not-sort --snvs < data.singletons.vcf > data_singletons.snvs.unsorted.bed
/tmp/1510692579.90521360.shell: line 23:  5916 Segmentation fault      (core dumped) /hpc/users/ram/utils/bedops/2.4.29/bin/convert2bed-megarow --input=vcf --do-not-sort --insertions < data.singletons.vcf > data_singletons.insertions.unsorted.bed
/tmp/1510692579.90521360.shell: line 24:  5921 Segmentation fault      (core dumped) /hpc/users/ram/utils/bedops/2.4.29/bin/convert2bed-megarow --input=vcf --do-not-sort --deletions < data.singletons.vcf > data_singletons.deletions.unsorted.bed
/tmp/1510692579.90521360.shell: line 27:  5927 Segmentation fault      (core dumped) /hpc/users/ram/utils/bedops/2.4.29/bin/convert2bed-megarow --input=vcf --snvs --do-not-sort < data.norm.vcf > data.snvs.unsorted.bed
/tmp/1510692579.90521360.shell: line 28:  5932 Segmentation fault      (core dumped) /hpc/users/ram/utils/bedops/2.4.29/bin/convert2bed-megarow --input=vcf --insertions --do-not-sort < data.norm.vcf > data.insertions.unsorted.bed
/tmp/1510692579.90521360.shell: line 29:  5937 Segmentation fault      (core dumped) /hpc/users/ram/utils/bedops/2.4.29/bin/convert2bed-megarow --input=vcf --deletions --do-not-sort < data.norm.vcf > data.deletions.unsorted.bed

There are a bunch of core.XXXX files ~30M in size and empty BED files.

The config I'm requesting is 1 node (1 CPU) with 16G RAM.

ADD REPLYlink modified 8 days ago • written 8 days ago by Ram12k

Thanks. How are these binaries installed? Are these downloaded from the Github package, or did you compile these?

ADD REPLYlink written 8 days ago by Alex Reynolds21k

These are direct binaries that I downloaded off GitHub. I'm trying another route now - I extracted the vcf.gz files using pigz -dc, now I'm trying using gunzip -c. It probably won't make a difference, but just eliminating pigz-induced corruptions as a factor.

ADD REPLYlink written 8 days ago by Ram12k

I have compiled binaries as well, I can try them if required.

ADD REPLYlink written 8 days ago by Ram12k
2
gravatar for Alex Reynolds
8 days ago by
Alex Reynolds21k
Seattle, WA USA
Alex Reynolds21k wrote:

Self-compiled binaries would be a useful test, as it would remove differences in Linux setup as a potential source of problems.

If you can build and run self-compiled binaries, please use make all to build them, so that you get a megarow build of convert2bed. If you instead use make, please be sure to edit the exponent constants so that the binaries support longer VCF records.

Also, if you can, ideally, if you can write the first 1M lines of your VCF to an archive and post it somewhere where I could download it, I could try converting it on my end and see what's going on inside convert2bed during conversion, to see what might be causing the segfault.

ADD COMMENTlink written 8 days ago by Alex Reynolds21k
1

That worked! (for the 27G file at least). Let's see if it runs to completion for the 77G file! :-)

ADD REPLYlink written 8 days ago by Ram12k

If this works for your 77G file, I would be curious to know some details about your Linux setup (what your HPC runs, whatever comes out of ldd --version on a compute node, etc.). Some of those details could help improve how I make Github packages more compatible with what people are running.

ADD REPLYlink written 8 days ago by Alex Reynolds21k
1

Sure thing, I'll check tomorrow and let you know. We recently had a firmware + software upgrade, so I can even ask our HPC team if they could help with some input.

Maybe https://hpc.mssm.edu/systems/minerva/hardware (or anything under https://hpc.mssm.edu in general) would help in the meantime?

ADD REPLYlink modified 8 days ago • written 8 days ago by Ram12k
1

It ran to completion without any problems. DM me on Twitter with any questions you have!

➜ ldd --version
ldd (GNU libc) 2.12
Copyright (C) 2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Roland McGrath and Ulrich Drepper.
ADD REPLYlink modified 8 days ago • written 8 days ago by Ram12k
1

That's an older version of glibc than I am compiling the precompiled binaries with. As glibc is not forwards-compatible, I'd say that's probably why you ran into problems with the precompiled stuff. This is useful info, thanks!

ADD REPLYlink written 7 days ago by Alex Reynolds21k

with parallel or wo parallel?

ADD REPLYlink written 8 days ago by cpad01123.1k

Without. I had to test my build and did not want to add anything to it. Once it started running well, I didn't interrupt it and now it's not using parallel.

ADD REPLYlink written 8 days ago by Ram12k

thanks for the info :)

ADD REPLYlink written 8 days ago by cpad01123.1k

I'm sure if I do use parallel --pipe, it'll work fine now. It was the discrepancy between the pre-compiled binary and my cluster that was the cause, not the parallel.

ADD REPLYlink written 8 days ago by Ram12k

I compiled from source after editing the REST EXPONENT to 17 (as I detailed yesterday). I'll use that and check what happens.

gunzipped input files failed too, BTW :-(

ADD REPLYlink modified 8 days ago • written 8 days ago by Ram12k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1355 users visited in the last hour