Bedtools maskfasta error - std::out_of_range
1
0
Entering edit mode
22 months ago

Hello everyone!

I'm using bedtools maskfasta in a pipeline that I run on many genomes.

I have 2 genomes that don't run.. They are large plant genomes, and I had to cut some of the chromosome in 2 (after 2Go, that was the limit of my other tool) to run another analysis in which I identify transposable elements. Everything is formatted the same as you can see

My genome identifiers: genome.fasta

>Ptabu_chr_10
>Ptabu_chr_1-1
>Ptabu_chr_11
>Ptabu_chr_1-2
>Ptabu_chr_12
>Ptabu_chr_2-1
>Ptabu_chr_2-2
>Ptabu_chr_3-1
>Ptabu_chr_3-2
>Ptabu_chr_4-1
>Ptabu_chr_4-2
>Ptabu_chr_5-1
>Ptabu_chr_5-2
>Ptabu_chr_6
>Ptabu_chr_7
>Ptabu_chr_8
>Ptabu_chr_9

And an example of my gff file: sirevirus.fasta

This file was produced on the previous genome file, so the elements found are within the range of the chromosomes.

Ptabu_chr_11    MASiVE  Sirevirus   1647309169  1647317981  .   -   .   ID=Ptabu_chr_11-P-1647309169;length=8813;age=1.9846
Ptabu_chr_11    MASiVE  Sirevirus   1647574177  1647583012  .   -   .   ID=Ptabu_chr_11-P-1647574177;length=8836;age=0.1308
Ptabu_chr_11    MASiVE  Sirevirus   1648591185  1648599878  .   -   .   ID=Ptabu_chr_11-P-1648591185;length=8694;age=0.2808
Ptabu_chr_1-2   MASiVE  Sirevirus   1385320 1399673 .   +   .   ID=Ptabu_chr_1-2-D-1385320;length=14354;age=2.0154
Ptabu_chr_1-2   MASiVE  Sirevirus   1698108 1705751 .   +   .   ID=Ptabu_chr_1-2-D-1698108;length=7644;age=0.2192
Ptabu_chr_1-2   MASiVE  Sirevirus   5246903 5255035 .   +   .   ID=Ptabu_chr_1-2-D-5246903;length=8133;age=0.4846

The command line:

bedtools maskfasta -fi ../genome.fasta -bed ../sirevirus.fasta -fo genome.fasta_all_fulllength_masked.fasta

The error I get is:

terminate called after throwing an instance of 'std::out_of_range'
  what():  basic_string::substr: __pos (which is 3505698666) > this->size() (which is 2000000000)

I ran only chr 5-2 of this species as a test and there was no issue. The output created by maskfasta only appends 2 chromosomes before it crashes. The position in the error message '3505698666' doesn't exist in my data, no chromosome is that big.

I have the same issue on the garlic genome which is 14Go and one chromosome was cut, but it runs perfectly fine in wheat which is also 14Go but with smaller chromosomes.

If anyone can help me, I would greatly appreciate it!

Thank you :)

bedtools maskfasta • 1.7k views
ADD COMMENT
0
Entering edit mode

what is the longest position in the bed/gff ? what is the longest chromosome in the fasta ?

ADD REPLY
0
Entering edit mode

also, if you have gdb (GNU debugger) please, run

$ gdb /path/to/bedtools
> run  maskfasta -fi ../genome.fasta -bed ../sirevirus.fasta -fo genome.fasta_all_fulllength_masked.fasta

wait for the error, type backtrace and show us the output

ADD REPLY
0
Entering edit mode

Hi, thanks for your reply

length chr

Ptabu_chr_10    1752849333
Ptabu_chr_1-1   2000000000
Ptabu_chr_11    1650012615
Ptabu_chr_1-2   364278061
Ptabu_chr_12    1392452741
Ptabu_chr_2-1   2000000000
Ptabu_chr_2-2   317450362
Ptabu_chr_3-1   2000000000
Ptabu_chr_3-2   291775479
Ptabu_chr_4-1   2000000000
Ptabu_chr_4-2   192534405
Ptabu_chr_5-1   2000000000
Ptabu_chr_5-2   148190925
Ptabu_chr_6 2107674557
Ptabu_chr_7 2082167746
Ptabu_chr_8 2081484518
Ptabu_chr_9 2024734096

longest position gff

Ptabu_chr_10    1752201229
Ptabu_chr_1-1   1999301581
Ptabu_chr_11    1649205100
Ptabu_chr_1-2   363775309
Ptabu_chr_12    1391961094
Ptabu_chr_2-1   1999506285
Ptabu_chr_2-2   317000539
Ptabu_chr_3-1   1999921921
Ptabu_chr_3-2   291714243
Ptabu_chr_4-1   1999940570
Ptabu_chr_4-2   192482611
Ptabu_chr_5-1   1999898259
Ptabu_chr_5-2   146484902
Ptabu_chr_6 2107422126
Ptabu_chr_7 2082021302
Ptabu_chr_8 2081193673
Ptabu_chr_9 2024468130

Output of the gdb:

julie@ubuntu-022756:/mnt/RAID/data/genomes/Pinus_tabuliformis$ gdb /usr/bin/bedtools
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/bin/bedtools...
(No debugging symbols found in /usr/bin/bedtools)
(gdb) run maskfasta -fi Ptabu_chr_ALL.fasta.clean -bed Ptabu_chr_ALL.fasta.clean.MASiVE.Sirevirus.gff -fo Ptabu_chr_ALL.fasta.clean_all_fulllength_masked.fasta
Starting program: /usr/bin/bedtools maskfasta -fi Ptabu_chr_ALL.fasta.clean -bed Ptabu_chr_ALL.fasta.clean.MASiVE.Sirevirus.gff -fo Ptabu_chr_ALL.fasta.clean_all_fulllength_masked.fasta
terminate called after throwing an instance of 'std::out_of_range'
  what():  basic_string::substr: __pos (which is 3505698666) > this->size() (which is 2000000000)

Program received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50  ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
ADD REPLY
0
Entering edit mode

gdb creashed itself ? if no, please add type backtrace after the error and show us the output

ADD REPLY
0
Entering edit mode

I don't think it crashed. Just stopped at the same point it's been stopping when trying to run it.

(gdb) backtrace
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ffff7a7e859 in __GI_abort () at abort.c:79
#2  0x00007ffff7e53911 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007ffff7e5f38c in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007ffff7e5f3f7 in std::terminate() () from /lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007ffff7e5f6a9 in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007ffff7e563ab in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x000055555564f09a in ?? ()
#8  0x000055555564fbcd in ?? ()
#9  0x00005555556508e4 in ?? ()
#10 0x0000555555651e11 in ?? ()
#11 0x000055555558360f in ?? ()
#12 0x00007ffff7a800b3 in __libc_start_main (main=0x5555555830f0, argc=8, argv=0x7fffffffe378, init=<optimised out>, fini=<optimised out>, rtld_fini=<optimised out>, stack_end=0x7fffffffe368) at ../csu/libc-start.c:308
#13 0x000055555558822e in ?? ()

not sure if this is the expected output

ADD REPLY
0
Entering edit mode
#7  0x000055555564f09a in ?? ()
#8  0x000055555564fbcd in ?? ()
#9  0x00005555556508e4 in ?? ()
#10 0x0000555555651e11 in ?? ()
#11 0x000055555558360f in ?? ()

sadly, bedtools was compiled without debugging option. there is not useful trace here.

ADD REPLY
0
Entering edit mode

the lengths look ok...

ADD REPLY
0
Entering edit mode
22 months ago
xubeisi ▴ 30

this error seems due to any of the chrom size with 2000000000, Ptabu_chr_5-2 is fine doesn't tell anything. it doesn't look like a memory issue(usually would see "segmentation fault" if it's memory)

I would suggest 1) check whether the .fai file is right, also check whether regenerate .fai file would work. 2) You might need split the fasta into chroms and split .gff file into pieces try to narrow down to the lines cause the trouble, sometime the format might be wrong.

ADD COMMENT
0
Entering edit mode

I've been running several times several of the chromosomes individually instead of the whole genome in one go and I can't reproduce my original error. I do encounter a new error with blast on the next step, but the only error I have is "Program failed, try executing the command manually.". It doesn't work on 2Go chromosomes but it works on smaller ones. Do you know if there's a limit on command line blast? I never heard of it

ADD REPLY
0
Entering edit mode

me neither. Although blast is more suitable for "local alignment". it's good idea to blast the whole chr. You might want try something like MUMmer or Mugsy

ADD REPLY

Login before adding your answer.

Traffic: 2771 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6