BedTools getfasta -split
1
0
Entering edit mode
6.8 years ago
Alice • 0

Hello,

I'm trying to extract some sequences from a multifasta file (a genome) using the following command:

bedtools getfasta -fi T_aestivum_genomeA.fa -bed urartuAestivum_blocks_sort.bed12 -split -name -fo blocks_aestivumA.fa

I didn't get any kind of error from the program but, in the output multifasta file, for some sequences, there is only the header. I checked the bed12 file and I didn't find any anomaly in the rows corresponding to the missing sequences. I also manually checked the coordinates on the genome of some missing sequences and there wasn't anything strange (Ns or something). I got the correct output if I don't use the -split option but I don't want the entire sequence, so I think the problem is in the blocks.

Here is my how my bed12 file looks like:

7A  25225503    25225944    TCONS_00077526_aestivumA    *   *   *   *   *   1   441,    25225503,
7A  35229975    35230420    TCONS_00076940_aestivumA    *   *   *   *   *   1   445,    35229975,
7A  35501306    35501751    TCONS_00170589_aestivumA    *   *   *   *   *   2   139,306,    35501306,35501445,
7A  131421239   131421684   TCONS_00107436_aestivumA    *   *   *   *   *   2   281,88, 131421239,131421596,
7A  10711045    10711495    TCONS_00150021_aestivumA    *   *   *   *   *   1   450,    10711045,
7A  167627488   167627939   TCONS_00024036_aestivumA    *   *   *   *   *   1   451,    167627488,
7A  48932559    48933013    TCONS_00136773_aestivumA    *   *   *   *   *   1   454,    48932559,

The forth line corresponds to one of the sequence I didn't get.

Anyone experienced a similar problem? Thank you!

Alice

bedtools getfasta -split • 4.5k views
ADD COMMENT
3
Entering edit mode
6.8 years ago
microfuge ★ 1.9k

Hi, May be I am wrong but the last column block start is supposed to in relative to chrom start (here ). I have not checked your sample data properly but they seem to be very large.

ADD COMMENT
1
Entering edit mode

Yes, it looks like the original poster is using absolute coordinates (the block start is equal to chrom start) - none of the lines are correct.

ADD REPLY
0
Entering edit mode

Thanks a lot to both of you, I tried to change the block start column putting the values relative to chromosomes coordinates (the first block always starts with 0) and it worked! I realized that also for the other lines, for which BedTools extracted a sequence, that sequence was actually wrong (because, as you said, the blocks starts were not relative to chrom start), so I don't understand how it managed to extract something. Anyway I will change the last column of every line as you said. Thanks again!

ADD REPLY

Login before adding your answer.

Traffic: 3310 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6