Segmentation fault (core dumped) error with bigWigAverageOverBed
0
0
Entering edit mode
6 months ago
camelest ▴ 20

I'm wondering someone could help about the error on bigWigAverageOverBed. My system is Ubuntu 18.04.4 LTS and bigWigAverageOverbed is v357.

I'm encountering an error with a simple code as below,

bigWigAverageOverBed input.bw INPUT.bed output.tab


which gives me back as

processing chromosomes Segmentation fault (core dumped)


I have three files as INPUT.bed, which I modified by awk command according to their ovelaps with another reference BEDfile. Somehow only one of the three INPUT.bed gives the error as above. Since the size of the one was relatively large (36M), so I tried splitting it but the error doesn't change.

Any help would be really appreciated.

bigWigAverageOverBed RNA-Seq • 654 views
1
Entering edit mode

I had a similar error (Segmentation fault) in bigWigAverageOverBed due to a small number of entries in the bed file which had an end position beyond the length of their respective chromosome. Once these were removed the file processed fine regardless of size (up to 16M at least). I suggest rather than just splitting the file, trying to slice it to from the top (e.g. using head, each time with a bigger -n) to see if the problem only starts at a certain point in the bed file.

0
Entering edit mode

I'm sorry for the late reply. I thought I replied but just realized it wasn't successfully posted. In conclusion, chatul's point was correct. When I removed regions beyond the chromosomes, the error went away. Thank you so much for the help.

0
Entering edit mode

what is the output of

file INPUT.bed input.bw

awk '{printf("%d,%s\n",int($3)-int($2),$0);}' INPUT.bed | sort -t, -k1,1n | head awk '{printf("%d,%s\n",int($3)-int($2),$0);}' INPUT.bed | sort -t, -k1,1n | tail

0
Entering edit mode

Thank you for your input. These are the results.

file INPUT.bed input.bw
INPUT.bed: ASCII text
input.bw: data

awk '{printf("%d,%s\n",int($3)-int($2),$0);}' INPUT.bed | sort -t, -k1,1n | head 1,chr1 100000358 100000359 chr1:100000358-100000359,+ 0 + 1,chr1 10002693 10002694 chr1:10002693-10002694,+ 0 + 1,chr1 10002731 10002732 chr1:10002731-10002732,- 0 - 1,chr1 10002877 10002878 chr1:10002877-10002878,+ 0 + 1,chr1 10002963 10002964 chr1:10002963-10002964,+ 0 + 1,chr1 10003111 10003112 chr1:10003111-10003112,+ 0 + 1,chr1 10003414 10003415 chr1:10003414-10003415,+ 0 + 1,chr1 10003546 10003547 chr1:10003546-10003547,- 0 - 1,chr1 10003591 10003592 chr1:10003591-10003592,+ 0 + 1,chr1 10003596 10003597 chr1:10003596-10003597,- 0 - awk '{printf("%d,%s\n",int($3)-int($2),$0);}' INPUT.bed | sort -t, -k1,1n | tail

320,chr3    149374579   149374899   chr3:149374579-149374899,+  +
325,chr3    24575115    24575440    chr3:24575115-24575440,+    +
331,chr16   2318409 2318740 chr16:2318409-2318740,+ 0   +
334,chr11   134337126   134337460   chr11:134337126-134337460,+ +
346,chr2    170219168   170219514   chr2:170219168-170219514,-  -
358,chr9    69065046    69065404    chr9:69065046-69065404,-    -
364,chr8    24813374    24813738    chr8:24813374-24813738,-    -
393,chr12   48336640    48337033    chr12:48336640-48337033,-   -
435,chr15   80696258    80696693    chr15:80696258-80696693,-   -
561,chr11   65266877    65267438    chr11:65266877-65267438,+   +

0
Entering edit mode

That looks like a pretty unusual bed file, where the chromosomes do not look formatted correctly, and it isn't sorted in a typical way. What do the chromosome names in the bigWig file look like?

0
Entering edit mode

Thank you for your help. So it seems that there is something wrong in Chr column with my BED file. I used bigWigtobedGraph and the result is something like this.

bigWigToBedGraph input.bw output.bedGraph

chr1    629903  629904  1
chr1    629909  629910  1
chr1    629916  629917  1
chr1    629919  629920  1
chr1    629929  629930  4
chr1    629932  629933  1

0
Entering edit mode

Chromosome names have to match up to do operations. In other words, 1,chr1 will not match up with chr1, for example. Adjust your awk statement accordingly, so that you're not adding commas, numbers or other extraneous stuff to the chromosome field of INPUT.bed.

0
Entering edit mode

Thank you for your comment. I think something like "1,Chr1" is output of the code Pierre Lindenbaum suggested. 1 is the result of int($3)-int($2) in printf, if I understand correctly (sorry I'm pretty new to this area).

The below code gives me the usual Chr orders.

awk '{print $1}' INPUT.bed | sort | uniq chr1 chr10 chr11 chr11_gl000202_random chr12 chr13 chr14 chr15 chr16 chr17  Do you have any other ideas why this doesn't work? Thank you so much for the help. ADD REPLY 0 Entering edit mode you're not the only one with this problem: it looks like a problem with the memory management/ with the OS: https://www.google.com/search?client=q=bigWigAverageOverBed+segmentation+fault ADD REPLY 0 Entering edit mode Thank you for your input. I also encountered some posts stating that. But if its memory problem, shouldn't splitting the file solve the problem? ADD REPLY 0 Entering edit mode You could try splitting the BED file by chromosome: $ sort-bed INPUT.bed > INPUT.sorted.bed
$for CHR in bedextract --list-chr INPUT.sorted.bed; do bedextract${CHR} INPUT.sorted.bed > INPUT.sorted.\${CHR}.bed; done


Then run your bigWigAverageOverBed step on each per-chromosome file.

If you want to track memory usage, you can run top while running your per-chromosome process and press Shift-M to sort processes by memory.

Presumably, chr1 would be your largest file and so use the most memory.