Question: Converting Gff To Bed With Bedtools?
3
gravatar for user
6.0 years ago by
user790
United States
user790 wrote:

I use bedtools's sortBed utility to sort BED files for various operations. It takes as input GFF files as well. However, when I feed it a GFF file as in:

sortBed -i myfile.gff

it outputs it as GFF, not BED. Is there a way to make bedtools sort and then convert the result to BED? Many bedtools utilities have a -bed flag. Do I need to use a different subutility of bedtools to achieve this? thanks.

bedtools bed gff conversion • 20k views
ADD COMMENTlink modified 6.0 years ago by Alex Reynolds27k • written 6.0 years ago by user790

try this: http://code.google.com/p/bedops/wiki/gff2bed#Dependencies

ADD REPLYlink written 6.0 years ago by Gjain5.3k
9
gravatar for brentp
6.0 years ago by
brentp22k
Salt Lake City, UT
brentp22k wrote:

If the problem is with sortBed, just use linux sort

sort -k1,1 -k4,4n input.gff > output.gff
ADD COMMENTlink written 6.0 years ago by brentp22k
1

OP wants the opposite. I.e. input GFF, output BED.

ADD REPLYlink written 6.0 years ago by Konrad Rudolph120
1

Oh. I completely misread that!

ADD REPLYlink written 6.0 years ago by brentp22k
8
gravatar for Alex Reynolds
6.0 years ago by
Alex Reynolds27k
Seattle, WA USA
Alex Reynolds27k wrote:

Why not just pipe it into a conversion script? For example:

$ sortBed -i myfile.gff | gff2bed > my_sorted_file.bed

You can also skip sortBed altogether, as gff2bed uses BEDOPS sort-bed internally:

$ gff2bed < myfile.gff > my_sorted_file.bed

In practice, sort-bed tool is faster than GNU sort and sortBed and is non-lossy *shrug*.

ADD COMMENTlink modified 2.9 years ago • written 6.0 years ago by Alex Reynolds27k

yes, you are right, I was just hoping to do it within bedtools for uniformity and not relying on more scripts (since I already use bedtools intersect and sorting features), so was curious if there was some undocumented trick perhaps to make bedtools do it.

ADD REPLYlink modified 6.0 years ago • written 6.0 years ago by user790

This conversion script doesn't build the subfeature structure - all it does is convert line by line and doesn't appear to construct actual gene models. Maybe I am not using it correctly?

ADD REPLYlink written 4.8 years ago by Ann2.2k
1

OK I looked at the documentation. Should have done that first! Anyway, looks like gff2bed doesn't build the BED structure with exon starts, exon lengths, thickStart, thickStop and so on.

I guess this is pretty hard to make work, since different groups write out GFF data in different ways. Despite the very nice description on the Sequence Ontology Web site, there is still no consensus on how to write GFF.

To get an idea of the sheer diversity of GFF3 variants that all organize gene models in different ways, see:

Does anyone know of a tool that can handle all these variations?????

ADD REPLYlink modified 4.8 years ago • written 4.8 years ago by Ann2.2k

If you have any thoughts on strategies for working with variants in conversion tools, please let me know.

It seems like something that is not easy to solve without some heavy research project-specific customization that I'd otherwise steer clear of with a generic toolkit.

I've also noticed that the TAIR10 set now seems to follow correct GFF formatting, which is a nice change.

But overall, it can be a tough problem to abstract.

ADD REPLYlink modified 2.9 years ago • written 4.8 years ago by Alex Reynolds27k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 775 users visited in the last hour