Converting Gff To Bed With Bedtools?
2
3
Entering edit mode
11.3 years ago
user ▴ 940

I use bedtools's sortBed utility to sort BED files for various operations. It takes as input GFF files as well. However, when I feed it a GFF file as in:

sortBed -i myfile.gff

it outputs it as GFF, not BED. Is there a way to make bedtools sort and then convert the result to BED? Many bedtools utilities have a -bed flag. Do I need to use a different subutility of [bedtools to achieve this? thanks.

gff bed bedtools • 44k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
9
Entering edit mode
11.3 years ago
brentp 24k

If the problem is with sortBed, just use linux sort

sort -k1,1 -k4,4n input.gff > output.gff
ADD COMMENT
2
Entering edit mode

OP wants the opposite. I.e. input GFF, output BED.

ADD REPLY
1
Entering edit mode

Oh. I completely misread that!

ADD REPLY
9
Entering edit mode
11.3 years ago

Why not just pipe it into a conversion script? For example:

$ sortBed -i myfile.gff | gff2bed > my_sorted_file.bed

You can also skip sortBed altogether, as gff2bed uses BEDOPS sort-bed internally:

$ gff2bed < myfile.gff > my_sorted_file.bed

In practice, sort-bed tool is faster than GNU sort and sortBed and is non-lossy *shrug*.

ADD COMMENT
0
Entering edit mode

yes, you are right, I was just hoping to do it within bedtools for uniformity and not relying on more scripts (since I already use bedtools intersect and sorting features), so was curious if there was some undocumented trick perhaps to make bedtools do it.

ADD REPLY
0
Entering edit mode

This conversion script doesn't build the subfeature structure - all it does is convert line by line and doesn't appear to construct actual gene models. Maybe I am not using it correctly?

ADD REPLY
1
Entering edit mode

OK I looked at the documentation. Should have done that first! Anyway, looks like gff2bed doesn't build the BED structure with exon starts, exon lengths, thickStart, thickStop and so on.

I guess this is pretty hard to make work, since different groups write out GFF data in different ways. Despite the very nice description on the Sequence Ontology Web site, there is still no consensus on how to write GFF.

To get an idea of the sheer diversity of GFF3 variants that all organize gene models in different ways, see:

Does anyone know of a tool that can handle all these variations?????

ADD REPLY
0
Entering edit mode

If you have any thoughts on strategies for working with variants in conversion tools, please let me know.

It seems like something that is not easy to solve without some heavy research project-specific customization that I'd otherwise steer clear of with a generic toolkit.

I've also noticed that the TAIR10 set now seems to follow correct GFF formatting, which is a nice change.

But overall, it can be a tough problem to abstract.

ADD REPLY

Login before adding your answer.

Traffic: 1848 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6