Question: How Do I Convert From Bed Format To Gff Format?
7
gravatar for Istvan Albert
9.2 years ago by
Istvan Albert ♦♦ 78k
University Park, USA
Istvan Albert ♦♦ 78k wrote:

I have a file in GFF format and I need to convert it to BED format. What do I do?

bed gff galaxy • 19k views
ADD COMMENTlink modified 2.5 years ago by BCArg60 • written 9.2 years ago by Istvan Albert ♦♦ 78k

I'll answer my own question here as it is a demo for now

ADD REPLYlink written 9.2 years ago by Istvan Albert ♦♦ 78k

someone should write a well-documented, well-tested python module / script to do this! many current converters either discard CDS information or include every CDS or mRNA on its own line. would be nice if it this script had an option to include CDSs on the same line using the extended bad format.

ADD REPLYlink written 8.6 years ago by brentp22k

https://bitbucket.org/galaxy/galaxy-central/src/61b09dc1dff2/tools/filters/bed_to_gff_converter.py

ADD REPLYlink modified 6.3 years ago by Istvan Albert ♦♦ 78k • written 6.8 years ago by Ying W3.9k

Used the perl script submitted from Alex Reynolds and it worked absolutely fine.

ADD REPLYlink written 2.5 years ago by BCArg60

The question here is exactly the opposite of the title of the question

ADD REPLYlink written 5 months ago by cmdcolin950
6
gravatar for Istvan Albert
9.2 years ago by
Istvan Albert ♦♦ 78k
University Park, USA
Istvan Albert ♦♦ 78k wrote:

Both formats are tab delimited text files used to represent DNA features in genomes. The order of columns between the two are different, there are also columns that correspond to attributes missing from one or the other format. Nonetheless the most important difference between the two is the coordinate systems that they assume.

The BED format developed at UCSC uses a zero based indexing and an open end interval whereas the GFF format developed at Sanger assumes a 1 based coordinate system that includes both start and end coordinates. Therefore

The [0,100] interval in BED format corresponds to [1,100] in GFF format and both are 100 base long. That the first element in BED format will be have the index of 0 where the last 100th element will have the index of 99! Whereas in GFF the first element will have the index of 1 and the last element will have the index of 100.

To convert between the two you may use Galaxy and select the section called Select Formats that will list various transformation options.

ADD COMMENTlink modified 9.2 years ago • written 9.2 years ago by Istvan Albert ♦♦ 78k
6
gravatar for Giovanni M Dall'Olio
8.6 years ago by
London, UK
Giovanni M Dall'Olio26k wrote:

You can also convert it from galaxy:

Go to 'Convert formats' and you will find a 'BED-to-GFF converter'.

ADD COMMENTlink written 8.6 years ago by Giovanni M Dall'Olio26k
5
gravatar for Alex Reynolds
8.9 years ago by
Alex Reynolds26k
Seattle, WA USA
Alex Reynolds26k wrote:

Here's a Perl script I wrote if you wanted to do something local.

There's some code in there for translating yeast chromosome names that can be removed, if not needed. I also used a Site feature in the GFF file as the region ID, which might also need tweaking, depending on what features you're interested in.

#!/usr/bin/perl -w

use strict;
use Bio::Tools::GFF;
use feature qw(say switch);

my $gffio = Bio::Tools::GFF->new(-fh => \*STDIN, -gff_version => 2);
my $feature;

while ($feature = $gffio->next_feature()) {
    # print $gffio->gff_string($feature)."\n";

    # cf. <http://www.sanger.ac.uk/Software/formats/GFF/GFF_Spec.shtml>
    my $seq_id = $feature->seq_id();   
    my $start = $feature->start() - 1;
    my $end = $feature->end();
    my $strand = $feature->strand();
    my @sites = $feature->get_tag_values('Site');

    # translate strand
    given ( $strand ) {
        when ($_ == 1)  { $strand = "+"; }
        when ($_ == -1) { $strand = "-"; }
    }

    # translate yeast chromosome to UCSC browser-readable chromosome
    # cf. <http://www.yeastgenome.org/sgdpub/Saccharomyces_cerevisiae.pdf>
    given ( $seq_id ) {
        when ( $_ eq "I" )    { $seq_id = "chr1"; }
        when ( $_ eq "II" )   { $seq_id = "chr2"; }
        when ( $_ eq "III" )  { $seq_id = "chr3"; }
        when ( $_ eq "IV" )   { $seq_id = "chr4"; }
        when ( $_ eq "V" )    { $seq_id = "chr5"; }
        when ( $_ eq "VI" )   { $seq_id = "chr6"; }
        when ( $_ eq "VII" )  { $seq_id = "chr7"; }
        when ( $_ eq "VIII" ) { $seq_id = "chr8"; }
        when ( $_ eq "IX" )   { $seq_id = "chr9"; }
        when ( $_ eq "X" )    { $seq_id = "chr10"; }
        when ( $_ eq "XI" )   { $seq_id = "chr11"; }
        when ( $_ eq "XII" )  { $seq_id = "chr12"; }
        when ( $_ eq "XIII" ) { $seq_id = "chr13"; }
        when ( $_ eq "XIV" )  { $seq_id = "chr14"; }
        when ( $_ eq "XV" )   { $seq_id = "chr15"; }
        when ( $_ eq "XVI" )  { $seq_id = "chr16"; }
        default { }
    }

    # output
    print "$seq_id\t$start\t$end\t$sites[0]\t0.0\t$strand\n";
}
$gffio->close();

To use it:

gff2bed.pl < data.gff > data.bed
ADD COMMENTlink written 8.9 years ago by Alex Reynolds26k

Just a note: code above need bioperl

ADD REPLYlink written 8.9 years ago by Istvan Albert ♦♦ 78k

Hi Alex, one thing you could do it replace the case block with a hash map that remaps chromosomes. That way it is a lot easier to add other entries withouth make the code longer and longer...

ADD REPLYlink written 8.9 years ago by Istvan Albert ♦♦ 78k

Definitely. At least it might serve as a stepping point for further modifications or tweaks.

ADD REPLYlink written 8.6 years ago by Alex Reynolds26k

Excellent, it worked well for me (the online tool above from galaxy did not), thanks a lot!

ADD REPLYlink written 2.5 years ago by BCArg60

There is also galaxy which offers the solutin highlighted below then you can also take a look at this link for the python script which can perform the same trick. Also take a look at this link

ADD REPLYlink written 2.5 years ago by vchris_ngs4.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1346 users visited in the last hour