Question: BedTools getfasta -split
9 weeks ago
Alice0 wrote:


I'm trying to extract some sequences from a multifasta file (a genome) using the following command:

bedtools getfasta -fi T_aestivum_genomeA.fa -bed urartuAestivum_blocks_sort.bed12 -split -name -fo blocks_aestivumA.fa

I didn't get any kind of error from the program but, in the output multifasta file, for some sequences, there is only the header. I checked the bed12 file and I didn't find any anomaly in the rows corresponding to the missing sequences. I also manually checked the coordinates on the genome of some missing sequences and there wasn't anything strange (Ns or something). I got the correct output if I don't use the -split option but I don't want the entire sequence, so I think the problem is in the blocks.

Here is my how my bed12 file looks like:

7A  25225503    25225944    TCONS_00077526_aestivumA    *   *   *   *   *   1   441,    25225503,
7A  35229975    35230420    TCONS_00076940_aestivumA    *   *   *   *   *   1   445,    35229975,
7A  35501306    35501751    TCONS_00170589_aestivumA    *   *   *   *   *   2   139,306,    35501306,35501445,
7A  131421239   131421684   TCONS_00107436_aestivumA    *   *   *   *   *   2   281,88, 131421239,131421596,
7A  10711045    10711495    TCONS_00150021_aestivumA    *   *   *   *   *   1   450,    10711045,
7A  167627488   167627939   TCONS_00024036_aestivumA    *   *   *   *   *   1   451,    167627488,
7A  48932559    48933013    TCONS_00136773_aestivumA    *   *   *   *   *   1   454,    48932559,

The forth line corresponds to one of the sequence I didn't get.

Anyone experienced a similar problem? Thank you!


9 weeks ago
microfuge710 wrote:

Hi, May be I am wrong but the last column block start is supposed to in relative to chrom start (here ). I have not checked your sample data properly but they seem to be very large.

Yes, it looks like the original poster is using absolute coordinates (the block start is equal to chrom start) - none of the lines are correct.

Thanks a lot to both of you, I tried to change the block start column putting the values relative to chromosomes coordinates (the first block always starts with 0) and it worked! I realized that also for the other lines, for which BedTools extracted a sequence, that sequence was actually wrong (because, as you said, the blocks starts were not relative to chrom start), so I don't understand how it managed to extract something. Anyway I will change the last column of every line as you said. Thanks again!

