bedtools getfasta error: Error: malformed GFF entry at line 15. Coordinate detected that is < 1
1
0
Entering edit mode
8 weeks ago
Simone • 0

I'm trying to use bedtools to extract regions of a genome based on coordinates in an annotation file. I keep getting error messages that Line 15 of my BED file has a start/end position where start > end,

Error: malformed GFF entry at line 15. Coordinate detected that is < 1. Exiting.
srun: error: cn68: task 0: Exited with exit code 1


but the file looks like this (line 15):

JAJFZI010040149.1   Complete    1671.3  0   4952    -   2371at8457  1421    https://www.orthodb.org/v10?query=2371at8457    Upstream transcription factor family member 3


BED files are zero-based I believe, but it's reading it as a GFF file. It has a .BED file extension so I'm not sure why it's detecting a GFF...

BED GFF bedtools • 312 views
0
Entering edit mode
8 weeks ago

Your file IS a GTF/GFF file whatever is the extension. Most Unix programs don't care about the suffix. GTF/GFF are one based. A GTF file cannot have a coordinate (4th column) starting at 0.

0
Entering edit mode

It's not a GFF file, it's a TSV annotation file, output from BUSCO. The annotations are coordinates for BUSCO genes. I've tried using BEDOPS to convert it to a BED or GFF but it doesn't work.

0
Entering edit mode

but it's decoded AS a GFF/GTF file because bedtools can see columns 4 and 5 are integers. move columns 4 and 5 to columns 2 and 3 to create a BED file.

0
Entering edit mode

Ah I see, thank you so much for your help! I'll use an awk one-liner to switch the coordinate columns to 2 and 3. Apparently the newest version of BUSCO (v5.4) outputs a GFF file now (probably for this reason), but all of my genomes were run on v5.2 so I'm having to work around it.