Question: Bedtools "Segmentation Fault" While Working With Genome.Fa
gravatar for PoGibas
8.0 years ago by
PoGibas4.8k wrote:

I wanted to use BEDTools to extract genomic sequences (fastaFromBed).

My BED file has all 24 chromosomes, hence I want to use whole genome (merged from chromosome.fa).

Tried to:
fastaFromBed -fi genome.fa -bed all.chromosomes.bed -fo output
but got
Segmentation fault (core dumped)

Tried to use every chromosome.fa separately and it worked:
fastaFromBed -fi chromosome${i}.fa -bed all.chromosomes.bed -fo output
Of course I am getting annoying
WARNING. chromosome (chr..) was not found in the FASTA file. Skipping.
But it's still better than nothing and really fast.

I prefer to use BEDTools for sequence extraction so I am wondering is it possible to solve this segmentation fault thing? It seems that large genome.fa file can't be handled by BEDTools as I also tried nucBed and got the same thing or it might be some genome merging problem.


This is the bed file I used for: intersectBed; closestBed; fastaFromBed ([][1]).
There were problems only with fastaFromBed and only when I tried to use the whole genome.fa (~3.15GB). As I mentioned before - used every chromosome separately, got warnings but there was no segmentation fault and output was fine. I am wandering that it might be genome.fa problem (used cat to merge chromosomes)


head genome.fa


cat genome.fa.fai

chr1 249250621 6 50 51
chr2 243199373 254235646 50 51
chr3 198022430 502299013 50 51
chr4 191154276 704281898 50 51
chr5 180915260 899259266 50 51
chr6 171115067 1083792838 50 51
chr7 159138663 1258330213 50 51
chr8 146364022 1420651656 50 51
chr9 141213431 1569942965 50 51
chr10 135534747 1713980672 50 51
chr11 135006516 1852226121 50 51
chr12 133851895 1989932775 50 51
chr13 115169878 2126461715 50 51
chr14 107349540 2243934998 50 51
chr15 102531392 2353431536 50 51
chr16 90354753 2458013563 50 51
chr17 81195210 2550175419 50 51
chr18 78077248 2632994541 50 51
chr19 59128983 2712633341 50 51
chr20 63025520 2772944911 50 51
chr21 48129895 2837230949 50 51
chr22 51304566 2886323449 50 51
chrX 155270560 2938654113 50 51
chrY 59373566 3097030091 50 51

genome.fa.fai was generated by BEDTools
index file genome.fa.fai not found, generating... And just after it's generated I am getting segmentation fault. If BEDTools scans the genome and generates index file maybe it's not the genome problem.

bedtools • 6.8k views
ADD COMMENTlink modified 6.8 years ago • written 8.0 years ago by PoGibas4.8k

Can you post your bedfile to some downloadable location. I suspect some sort of input error, bedtools is a very widely used tool and I can't imagine it not working. We could track that down here. That being said as with any tool it would be preferable if bedtools raised an error on invalid input and I am sure the author (who occasionally hangs out on Biostar) would be interested in seeing what may cause a segmentation fault.

ADD REPLYlink written 8.0 years ago by Istvan Albert ♦♦ 85k

in general make sure to post a comment when you edit the answer, that will notify people that a change has been made

ADD REPLYlink written 8.0 years ago by Istvan Albert ♦♦ 85k

I seem to be having the same problem, have you been able to fix this issue?

ADD REPLYlink written 7.9 years ago by nate.ellis80
gravatar for enricoferrero
8.0 years ago by
United Kingdom
enricoferrero800 wrote:

Check you have sorted the bed files with sortBed before using anything else!

ADD COMMENTlink written 8.0 years ago by enricoferrero800
gravatar for luwening
8.0 years ago by
luwening10 wrote:

paste some lines of your genome.fa, it looks link no "chr" in your genome.fa

ADD COMMENTlink written 8.0 years ago by luwening10
gravatar for PoGibas
6.8 years ago by
PoGibas4.8k wrote:

It appears that there was no problem with my input files (bed, genome.fa) or bedtools.

Segmentation fault was caused by weak hardware. I got this problem on netbook Samsung NC10 running Lubuntu. There is no problem running same analysis on more powerful machine.

Whoever is having similar problem I would suggest splitting genome.fa (eg.,split genome into two parts: chr1*.fa and chr[2-9]yxm.fa - this worked for me).

ADD COMMENTlink modified 6.8 years ago • written 6.8 years ago by PoGibas4.8k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1478 users visited in the last hour