Question: bedtools window - killed: 9 error
0
gravatar for spiral01
24 months ago by
spiral01100
spiral01100 wrote:

I am trying to use the bedtools window command to obtain counts of the number of variants in each window of the hg19 human vcf files. Here is the command:

bedtools window -a 50kb.bed -b chr1.vcf.gz -c > coverage.txt

This results in the following error:

Killed: 9

However, the command works fine on some of the smaller chromosomes (e.g. chr19) without the error occuring. What is causing this error and how can I stop it from happening?

snp • 1.7k views
ADD COMMENTlink modified 24 months ago by Alex Reynolds29k • written 24 months ago by spiral01100

Issue related to RAM or, more likely, available disk space. Instead of crashing your operating system, the shell kills off the process with signal 9.

On which OS are you running this? If linux/UNIX, is it being run on a shared system?

ADD REPLYlink written 24 months ago by Kevin Blighe52k

I am running this through the Linux terminal on a Mac. Is this an issue with bedtools then? I have worked on these same large files with other tools (bcftools, vcftools etc) with no issues. Does bedtools unzip the file before working on it?

ADD REPLYlink written 24 months ago by spiral01100

Yes I assume that it unpacks it into RAM and then performs the operation. As your chr1 is 30GB unpacked, though, you will require > 30GB RAM. It may actually work if you unpack it to the hard-disk first, and then re-run the bedtools command.

There are probably other fancy ways of doing this to avoid excessive memory usage.

ADD REPLYlink written 24 months ago by Kevin Blighe52k
2
gravatar for Alex Reynolds
24 months ago by
Alex Reynolds29k
Seattle, WA USA
Alex Reynolds29k wrote:

Another option:

$ gunzip -c chr1.vcf.gz | vcf2bed --sort-tmpdir="/some/large/dir" > chr1.bed
$ bedmap --echo --count --delim '\t' 50kb.bed chr1.bed > answer.bed
$ rm chr1.bed

Or to avoid creating an intermediate file:

$ bedmap --echo --count --delim '\t' 50kb.bed <( gunzip -c chr1.vcf.gz | vcf2bed --sort-tmpdir="/some/large/dir" ) > answer.bed

The directory /some/large/dir should be large enough to store chr1.vcf.

ADD COMMENTlink written 24 months ago by Alex Reynolds29k

Hi Alex, I did try this but feeding such large unzipped vcf files to memory isn't feasible and leads to system crashing (the vcf file is 1.2gb zipped but >30gb unzipped).

ADD REPLYlink written 24 months ago by spiral01100
1

If you do the second approach that uses standard Unix streams, then BEDOPS only uses ~2 GB of RAM, not 30 GB as other approaches may require. You can adjust this memory usage downwards, which is due to sorting, via --max-mem=<value> in convert2bed or vcf2bed, if you have less than 4GB of RAM. Final disk usage should be minimal, not much more than 50kb.bed in answer.bed. Intermediate disk usage for temporary files created for sorting may be about 30 GB. Hope this helps.

ADD REPLYlink written 24 months ago by Alex Reynolds29k

Thanks, this works! Would you be able to walk me through the command? I have never used brackets as you have done before here:

<( gunzip -c chr1.vcf.gz | vcf2bed --sort-tmpdir="/some/large/dir" ) >

Is that saying unzip the file and pipe it to the tmpdir,, before piping it to the main bedmap command?

ADD REPLYlink written 24 months ago by spiral01100
1

It is notation in bash called a process substitution: http://tldp.org/LDP/abs/html/process-sub.html

The process substitution I show above extracts the VCF and converts it to BED format, which is written to standard output that the larger bedmap command consumes as one of its inputs.

Another way to think about this is that this creates a temporary, transitory file that can be used where a filename normally gets specified. This file exists as long as it is needed for bedmap to do its work and it only sends or streams a small amount of data at a time to bedmap, which reduces the memory overhead considerably.

ADD REPLYlink modified 24 months ago • written 24 months ago by Alex Reynolds29k

Thank you, that's an excellent explanation.

ADD REPLYlink written 24 months ago by spiral01100
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 975 users visited in the last hour