6.5 years ago by
Bay Area, CA
There are no tools directly on the public Galaxy site to transform a GFF3 dataset into a BED12 dataset. However, the Tool Shed has a repository called 'fml_gff3togtf' that includes a tool for this purpose, for use in a local install. The description is a bit bothersome in that it includes a slightly incorrect datatype description, so be sure to test out the results. (the word "wiggle" has no place in this statement: "gff3_to_bed_converter.py: This tool converts gene transcript annotation from GFF3 format to UCSC wiggle 12 column BED format.")
It might be helpful to let you know what a BED12 file represents:
A BED12 file describes the complete, often spliced, alignment of a sequence to a reference genome. This does not include minor base variation, it is a macro alignment. You can think of each of the blocks as being "exons", although there is no magic here - if the sequence or genome had quality problems, or significant variation (large insertion or deletion), that could cause the alignment to fragment as well.
Here is the data description:
To see examples, at UCSC genome.ucsc.edu), EST or mRNA track will have this as the primary table format. All gene track can also be in BED12 format, or in a related one, genePred:
UCSC also has line-command utilities to convert between the formats, pre-compiled versions are here:
Either way, you can convert the data, then load up into the public Galaxy usegalaxy.org) and proceed with your analysis. BEDTools works well with BED12 files. There is definitely information loss attempting to transform BED6 -> BED12, as the global alignment is lost. And adjusting attributes such as score or name are often a preference, so you can alter these however you want, as long as the attribute formatting rules for the columns are followed.
Hopefully this helps,
Jen, Galaxy team