Question

How Do I Convert From Bed Format To Gff Format?

8

Entering edit mode

14.6 years ago

Istvan Albert 100k

I have a file in GFF format and I need to convert it to BED format. What do I do?

gff galaxy bed • 31k views

ADD COMMENT • link updated 6 months ago by Ram 43k • written 14.6 years ago by Istvan Albert 100k

0

Entering edit mode

I'll answer my own question here as it is a demo for now

ADD REPLY • link 14.6 years ago by Istvan Albert 100k

0

Entering edit mode

someone should write a well-documented, well-tested python module / script to do this! many current converters either discard CDS information or include every CDS or mRNA on its own line. would be nice if it this script had an option to include CDSs on the same line using the extended bad format.

ADD REPLY • link 13.9 years ago by brentp 24k

0

Entering edit mode

https://bitbucket.org/galaxy/galaxy-central/src/61b09dc1dff2/tools/filters/bed_to_gff_converter.py

ADD REPLY • link updated 11.7 years ago by Istvan Albert 100k • written 12.1 years ago by Ying W ★ 4.2k

0

Entering edit mode

Used the perl script submitted from Alex Reynolds and it worked absolutely fine.

ADD REPLY • link 7.9 years ago by BCArg ▴ 90

0

Entering edit mode

The question here is exactly the opposite of the title of the question

ADD REPLY • link 5.8 years ago by cmdcolin ★ 3.8k

8

Entering edit mode

13.9 years ago

Giovanni M Dall'Olio 28k

You can also convert it from galaxy:

http://main.g2.bx.psu.edu/

Go to 'Convert formats' and you will find a 'BED-to-GFF converter'.

ADD COMMENT • link updated 6 months ago by Ram 43k • written 13.9 years ago by Giovanni M Dall'Olio 28k

0

Entering edit mode

Hi Giovanni,

Do you know how to convert hdf5 to plink format?

ADD REPLY • link updated 6 months ago by Ram 43k • written 5.2 years ago by shawn ▴ 20

0

Entering edit mode

HDF-5 is a generic format for big datasets, which can be used for several applications, from astrophysics to scRNA data. It's conceptually similar to a zip file containing several files and folders. As such, there is not a single way to convert it to plink - it depends on which data is inside the HDF-5 file and in what format.

ADD REPLY • link 5.1 years ago by Giovanni M Dall'Olio 28k

5

Entering edit mode

14.3 years ago

Alex Reynolds 35k

Here's a Perl script I wrote if you wanted to do something local.

There's some code in there for translating yeast chromosome names that can be removed, if not needed. I also used a Site feature in the GFF file as the region ID, which might also need tweaking, depending on what features you're interested in.

#!/usr/bin/perl -w

use strict;
use Bio::Tools::GFF;
use feature qw(say switch);

my $gffio = Bio::Tools::GFF->new(-fh => \*STDIN, -gff_version => 2);
my $feature;

while ($feature = $gffio->next_feature()) {
    # print $gffio->gff_string($feature)."\n";

    # cf. <http://www.sanger.ac.uk/Software/formats/GFF/GFF_Spec.shtml>
    my $seq_id = $feature->seq_id();   
    my $start = $feature->start() - 1;
    my $end = $feature->end();
    my $strand = $feature->strand();
    my @sites = $feature->get_tag_values('Site');

    # translate strand
    given ( $strand ) {
        when ($_ == 1)  { $strand = "+"; }
        when ($_ == -1) { $strand = "-"; }
    }

    # translate yeast chromosome to UCSC browser-readable chromosome
    # cf. <http://www.yeastgenome.org/sgdpub/Saccharomyces_cerevisiae.pdf>
    given ( $seq_id ) {
        when ( $_ eq "I" )    { $seq_id = "chr1"; }
        when ( $_ eq "II" )   { $seq_id = "chr2"; }
        when ( $_ eq "III" )  { $seq_id = "chr3"; }
        when ( $_ eq "IV" )   { $seq_id = "chr4"; }
        when ( $_ eq "V" )    { $seq_id = "chr5"; }
        when ( $_ eq "VI" )   { $seq_id = "chr6"; }
        when ( $_ eq "VII" )  { $seq_id = "chr7"; }
        when ( $_ eq "VIII" ) { $seq_id = "chr8"; }
        when ( $_ eq "IX" )   { $seq_id = "chr9"; }
        when ( $_ eq "X" )    { $seq_id = "chr10"; }
        when ( $_ eq "XI" )   { $seq_id = "chr11"; }
        when ( $_ eq "XII" )  { $seq_id = "chr12"; }
        when ( $_ eq "XIII" ) { $seq_id = "chr13"; }
        when ( $_ eq "XIV" )  { $seq_id = "chr14"; }
        when ( $_ eq "XV" )   { $seq_id = "chr15"; }
        when ( $_ eq "XVI" )  { $seq_id = "chr16"; }
        default { }
    }

    # output
    print "$seq_id\t$start\t$end\t$sites[0]\t0.0\t$strand\n";
}
$gffio->close();

To use it:

gff2bed.pl < data.gff > data.bed

ADD COMMENT • link updated 4.8 years ago by Ram 43k • written 14.3 years ago by Alex Reynolds 35k

0

Entering edit mode

Just a note: code above need bioperl

ADD REPLY • link 14.3 years ago by Istvan Albert 100k

0

Entering edit mode

Hi Alex, one thing you could do it replace the case block with a hash map that remaps chromosomes. That way it is a lot easier to add other entries without make the code longer and longer...

ADD REPLY • link updated 6 months ago by Ram 43k • written 14.2 years ago by Istvan Albert 100k

0

Entering edit mode

Definitely. At least it might serve as a stepping point for further modifications or tweaks.

ADD REPLY • link 13.9 years ago by Alex Reynolds 35k

0

Entering edit mode

Excellent, it worked well for me (the online tool above from galaxy did not), thanks a lot!

ADD REPLY • link 7.9 years ago by BCArg ▴ 90

0

Entering edit mode

There is also galaxy which offers the solution highlighted below then you can also take a look at this link for the python script which can perform the same trick. Also take a look at this link

ADD REPLY • link updated 6 months ago by Ram 43k • written 7.9 years ago by ivivek_ngs ★ 5.2k

1

Entering edit mode

6 months ago

alejandrogzi ▴ 120

I recently developed bed2gff to quickly convert .bed files to a gff3 format, a tool written in Rust. Could be of help here!

ADD COMMENT • link updated 6 months ago by Ram 43k • written 6 months ago by alejandrogzi ▴ 120

Ram · Accepted Answer · 2009-09-30

Both formats are tab delimited text files used to represent DNA features in genomes. The order of columns between the two are different, there are also columns that correspond to attributes missing from one or the other format. Nonetheless the most important difference between the two is the coordinate systems that they assume.

The BED format developed at UCSC uses a zero based indexing and an open end interval whereas the GFF format developed at Sanger assumes a 1 based coordinate system that includes both start and end coordinates. Therefore

The [0,100] interval in BED format corresponds to [1,100] in GFF format and both are 100 base long. That the first element in BED format will be have the index of 0 where the last 100th element will have the index of 99! Whereas in GFF the first element will have the index of 1 and the last element will have the index of 100.

To convert between the two you may use Galaxy and select the section called Select Formats that will list various transformation options.