Converting Gff To Bed With Bedtools?
2
3
Entering edit mode
8.8 years ago
user ▴ 870

I use bedtools's sortBed utility to sort BED files for various operations. It takes as input GFF files as well. However, when I feed it a GFF file as in:

sortBed -i myfile.gff

it outputs it as GFF, not BED. Is there a way to make bedtools sort and then convert the result to BED? Many bedtools utilities have a -bed flag. Do I need to use a different subutility of bedtools to achieve this? thanks.

bedtools gff bed conversion • 33k views
0
Entering edit mode
9
Entering edit mode
8.8 years ago
brentp 23k

If the problem is with sortBed, just use linux sort

sort -k1,1 -k4,4n input.gff > output.gff
2
Entering edit mode

OP wants the opposite. I.e. input GFF, output BED.

1
Entering edit mode

9
Entering edit mode
8.8 years ago

Why not just pipe it into a conversion script? For example:

$sortBed -i myfile.gff | gff2bed > my_sorted_file.bed You can also skip sortBed altogether, as gff2bed uses BEDOPS sort-bed internally:$ gff2bed < myfile.gff > my_sorted_file.bed

In practice, sort-bed tool is faster than GNU sort and sortBed and is non-lossy *shrug*.

0
Entering edit mode

yes, you are right, I was just hoping to do it within bedtools for uniformity and not relying on more scripts (since I already use bedtools intersect and sorting features), so was curious if there was some undocumented trick perhaps to make bedtools do it.

0
Entering edit mode

This conversion script doesn't build the subfeature structure - all it does is convert line by line and doesn't appear to construct actual gene models. Maybe I am not using it correctly?

1
Entering edit mode

OK I looked at the documentation. Should have done that first! Anyway, looks like gff2bed doesn't build the BED structure with exon starts, exon lengths, thickStart, thickStop and so on.

I guess this is pretty hard to make work, since different groups write out GFF data in different ways. Despite the very nice description on the Sequence Ontology Web site, there is still no consensus on how to write GFF.

To get an idea of the sheer diversity of GFF3 variants that all organize gene models in different ways, see:

Does anyone know of a tool that can handle all these variations?????

0
Entering edit mode

If you have any thoughts on strategies for working with variants in conversion tools, please let me know.

It seems like something that is not easy to solve without some heavy research project-specific customization that I'd otherwise steer clear of with a generic toolkit.

I've also noticed that the TAIR10 set now seems to follow correct GFF formatting, which is a nice change.

But overall, it can be a tough problem to abstract.