Question: BedTools getfasta -split
gravatar for Alice
9 months ago by
Alice0 wrote:


I'm trying to extract some sequences from a multifasta file (a genome) using the following command:

bedtools getfasta -fi T_aestivum_genomeA.fa -bed urartuAestivum_blocks_sort.bed12 -split -name -fo blocks_aestivumA.fa

I didn't get any kind of error from the program but, in the output multifasta file, for some sequences, there is only the header. I checked the bed12 file and I didn't find any anomaly in the rows corresponding to the missing sequences. I also manually checked the coordinates on the genome of some missing sequences and there wasn't anything strange (Ns or something). I got the correct output if I don't use the -split option but I don't want the entire sequence, so I think the problem is in the blocks.

Here is my how my bed12 file looks like:

7A  25225503    25225944    TCONS_00077526_aestivumA    *   *   *   *   *   1   441,    25225503,
7A  35229975    35230420    TCONS_00076940_aestivumA    *   *   *   *   *   1   445,    35229975,
7A  35501306    35501751    TCONS_00170589_aestivumA    *   *   *   *   *   2   139,306,    35501306,35501445,
7A  131421239   131421684   TCONS_00107436_aestivumA    *   *   *   *   *   2   281,88, 131421239,131421596,
7A  10711045    10711495    TCONS_00150021_aestivumA    *   *   *   *   *   1   450,    10711045,
7A  167627488   167627939   TCONS_00024036_aestivumA    *   *   *   *   *   1   451,    167627488,
7A  48932559    48933013    TCONS_00136773_aestivumA    *   *   *   *   *   1   454,    48932559,

The forth line corresponds to one of the sequence I didn't get.

Anyone experienced a similar problem? Thank you!


getfasta -split bedtools • 740 views
ADD COMMENTlink written 9 months ago by Alice0
gravatar for microfuge
9 months ago by
microfuge740 wrote:

Hi, May be I am wrong but the last column block start is supposed to in relative to chrom start (here ). I have not checked your sample data properly but they seem to be very large.

ADD COMMENTlink written 9 months ago by microfuge740

Yes, it looks like the original poster is using absolute coordinates (the block start is equal to chrom start) - none of the lines are correct.

ADD REPLYlink modified 9 months ago • written 9 months ago by Istvan Albert ♦♦ 76k

Thanks a lot to both of you, I tried to change the block start column putting the values relative to chromosomes coordinates (the first block always starts with 0) and it worked! I realized that also for the other lines, for which BedTools extracted a sequence, that sequence was actually wrong (because, as you said, the blocks starts were not relative to chrom start), so I don't understand how it managed to extract something. Anyway I will change the last column of every line as you said. Thanks again!

ADD REPLYlink written 9 months ago by Alice0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 927 users visited in the last hour