Question: Convert .Gff3 File To 12-Column .Bed File
gravatar for LRStar
6.6 years ago by
United States
LRStar190 wrote:


I would like to convert .gff3 file to 12-column .bed file, as in this link under "BED Format" (

I have thusfar used Galaxy from Penn State, but it outputs a 6-column .bed file.

Any advice is greatly appreciated! Thank you...

ADD COMMENTlink modified 19 days ago by Juke344.1k • written 6.6 years ago by LRStar190
gravatar for danielschmelter
8 months ago by
danielschmelter80 wrote:

The UCSC Genome Browser hosts conversion utilities that you can run from your command line to accomplish the gff3 to BED12 conversion. Note utilities are OS specific and need to be given permission to execute with "chmod +x utilityName".

Here's an example of how I did a conversion using the following steps:

Disclaimer that I work for the UCSC Genome Browser. :)

ADD COMMENTlink modified 8 months ago • written 8 months ago by danielschmelter80

This is the answer that saved the day for me. Thank you!

ADD REPLYlink written 8 months ago by crcarroll10
gravatar for Alex Reynolds
6.6 years ago by
Alex Reynolds30k
Seattle, WA USA
Alex Reynolds30k wrote:

You might take a look at the BEDOPS gff2bed conversion script, to see if it gets you closer. (It doesn't make BED12, but is there enough information in a GFF3 file to get there? You'd need to add color metadata on your own, for instance.)

In any case, this script tries not to throw anything away, except for headers that other BEDOPS tools do not process. If you want to preserve them as BED elements, add the --keep-header option.

You can also apply cut or awk to the output of gff2bed in order to filter, rearrange or add non-GFF/color columns, if needed.

ADD COMMENTlink modified 6.3 years ago • written 6.6 years ago by Alex Reynolds30k

Thanks for the comment, Alex. I did try to do that, but am still uncertain for how to create certain columns of the .bed file. I think I can create Columns 1 2, 3, 5, 6, 7, 8: Bed(Col1) = Gff3 (Col1). Bed(Col2) = Gff3(Col4). Bed(Col3) = Gff3(Col5). Bed(Col5) = Gff3(Col6). Bed(Col6) = Gff3(Col7). Bed(Col7) = Bed(Col2) [at least in online examples, it seemed repeated]. Bed(Col8) = Bed(Col3) [at least in online examples, it seemed repeated].

ADD REPLYlink written 6.6 years ago by LRStar190

But then, I am left with Columns 4, 9, 10, 11, 12. For Col 4, I don't know how to get the "name" from the Gff3 file. For Col 9, can I just put them all as "255,0,0", else where can I get that from the Gff3 file. For Col 10, I am not sure what the "BlockCount" is. At first, I thought there could only be 0 or 1 exons on the line, and so this column could either be "0" or "1". But that would not make sense for Col 11 and Col 12, because these require comma-separated lists for each element in the blockCount (as if Col 10 has the potential to be >1).

ADD REPLYlink written 6.6 years ago by LRStar190

I particularly do not see examples of these last columns anywhere online. It seems most .bed files are <12 columns, but I am using a software called methylKit, which requires all 12 columns.

ADD REPLYlink written 6.6 years ago by LRStar190

Just in case, here is a head of my .gff3 file:

PdomScaf0001 maker gene 15 1963 . - . Name=PdomGene00025;ID=1;Dbxref=MAKER:maker-PdomScaf0001-snap-gene-0.274

PdomScaf0001 maker mRNA 15 1963 . - . Name=PdomMRNA00025.1;Parent=1;ID=2;_QI=216%7C0%7C0.2%7C0.6%7C0.5%7C0.6%7C5%7C0%7C98;_eAED=0.43;_AED=0.43;Dbxref=MAKER:maker-PdomScaf0001-snap-gene-0.274-mRNA-1

PdomScaf0001 maker exon 15 100 -0.094 - . Parent=2;ID=3

PdomScaf0001 maker CDS 15 100 . - 2 Parent=2;ID=4

PdomScaf0001 maker exon 223 300 21.8 - . Parent=2;ID=5

PdomScaf0001 maker CDS 223 300 . - 2 Parent=2;ID=6

PdomScaf0001 maker exon 717 765 22.4 - . Parent=2;ID=7

ADD REPLYlink modified 6.6 years ago • written 6.6 years ago by LRStar190

And a head of my .bed file, created using gff2bed (same as you suggested):

PdomScaf0001 14 100 3 -0.094 - maker exon . Parent=2;ID=3

PdomScaf0001 14 100 4 . - maker CDS 2 Parent=2;ID=4

PdomScaf0001 14 1963 1 . - maker gene . Name=PdomGene00025;ID=1;Dbxref=MAKER:maker-PdomScaf0001-snap-gene-0.274

PdomScaf0001 14 1963 2 . - maker mRNA . Name=PdomMRNA00025.1;Parent=1;ID=2;_QI=216%7C0%7C0.2%7C0.6%7C0.5%7C0.6%7C5%7C0%7C98;_eAED=0.43;_AED=0.43;Dbxref=MAKER:maker-PdomScaf0001-snap-gene-0.274-mRNA-1

PdomScaf0001 222 300 5 21.8 - maker exon . Parent=2;ID=5

PdomScaf0001 222 300 6 . - maker CDS 2 Parent=2;ID=6

PdomScaf0001 716 765 7 22.4 - maker exon . Parent=2;ID=7

PdomScaf0001 716 765 8 . - maker CDS 0 Parent=2;ID=8

PdomScaf0001 906 947 9 4.85 - maker exon . Parent=2;ID=9

PdomScaf0001 906 947 10 . - maker CDS 2 Parent=2;ID=10

ADD REPLYlink modified 6.6 years ago • written 6.6 years ago by LRStar190

As a side note, only other thing I noticed is that the score column (Column 6) of my .gff3 file does not always seem to be a number between 0 and 1000, which is the consensus of what I see online. In fact, 1844/183,748 lines of the .gff3 file have negative values for the score column. I downloaded it from a Genome Browser at my school. Not sure if this might be a problem?

ADD REPLYlink written 6.6 years ago by LRStar190
gravatar for Jennifer Hillman Jackson
6.5 years ago by
Bay Area, CA
Jennifer Hillman Jackson390 wrote:


There are no tools directly on the public Galaxy site to transform a GFF3 dataset into a BED12 dataset. However, the Tool Shed has a repository called 'fml_gff3togtf' that includes a tool for this purpose, for use in a local install. The description is a bit bothersome in that it includes a slightly incorrect datatype description, so be sure to test out the results. (the word "wiggle" has no place in this statement: " This tool converts gene transcript annotation from GFF3 format to UCSC wiggle 12 column BED format.")

It might be helpful to let you know what a BED12 file represents:

A BED12 file describes the complete, often spliced, alignment of a sequence to a reference genome. This does not include minor base variation, it is a macro alignment. You can think of each of the blocks as being "exons", although there is no magic here - if the sequence or genome had quality problems, or significant variation (large insertion or deletion), that could cause the alignment to fragment as well. Here is the data description:

To see examples, at UCSC, EST or mRNA track will have this as the primary table format. All gene track can also be in BED12 format, or in a related one, genePred:

UCSC also has line-command utilities to convert between the formats, pre-compiled versions are here:

Either way, you can convert the data, then load up into the public Galaxy and proceed with your analysis. BEDTools works well with BED12 files. There is definitely information loss attempting to transform BED6 -> BED12, as the global alignment is lost. And adjusting attributes such as score or name are often a preference, so you can alter these however you want, as long as the attribute formatting rules for the columns are followed.

Hopefully this helps,

Jen, Galaxy team

ADD COMMENTlink written 6.5 years ago by Jennifer Hillman Jackson390
gravatar for t_pod
2.9 years ago by
t_pod20 wrote:

Hi, did you find a solution on that topic?

I am currently using the methylKit also and I 've got a similar issue than you. As the annotation file was not available on UCSC, I've downloaded the gff3 file from NCBI and converted it to BED12 file via Galaxy.

However, I am finding many discrepancies in my converted BED12 file when I compare it to as "correct" BED12 file from UCSC:

5th and 9th columns are always "0" ( that might not be an issue),

in total 50% of lines have only the first 6 columns filled and they are just named "CDS" in the 4th column without further specification (should I discard all of them?),

and sometimes the column 11 displays n+1 items, where the additional item is "000" and n= blocks count (=column 10). I am expecting column 11 and 12 to be equal.

Any hints?

ADD COMMENTlink written 2.9 years ago by t_pod20
gravatar for Juke34
19 days ago by
Juke344.1k wrote:

Here you can find a list of tools for this conversion (AGAT, BEDOPS, PASA, Kent utils) and example of results they provide.
I would for sure recommend the AGAT's script ^^

ADD COMMENTlink modified 19 days ago • written 19 days ago by Juke344.1k

Does AGAT script convert gff3 into 12 column bed? I want conversion in a way to get 12 column bed which can be used as an input for methykit.

ADD REPLYlink modified 19 days ago • written 19 days ago by toralmanvar840

Yes, click the first link I provided, you will see how it looks. I cannot promise it will be exactly how you would like to be (e.g the RGB value in column 9).

ADD REPLYlink written 19 days ago by Juke344.1k

Thank you Juke-34, PASA worked in our case.

ADD REPLYlink written 16 days ago by toralmanvar840
gravatar for toralmanvar
19 days ago by
toralmanvar840 wrote:

Hello @t_pod,

Did you got any solution to your problem?

We are also using gff3ToGenePred for converting gff3 to genepred file but unfortunately getting output for only 2 chromosomes instead f 15.

Any help is appreciated.

ADD COMMENTlink written 19 days ago by toralmanvar840
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2014 users visited in the last hour