How Do I Convert From Bed Format To Gff Format?
3
8
Entering edit mode
11.6 years ago

I have a file in GFF format and I need to convert it to BED format. What do I do?

bed gff galaxy • 24k views
ADD COMMENT
0
Entering edit mode

I'll answer my own question here as it is a demo for now

ADD REPLY
0
Entering edit mode

someone should write a well-documented, well-tested python module / script to do this! many current converters either discard CDS information or include every CDS or mRNA on its own line. would be nice if it this script had an option to include CDSs on the same line using the extended bad format.

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Used the perl script submitted from Alex Reynolds and it worked absolutely fine.

ADD REPLY
0
Entering edit mode

The question here is exactly the opposite of the title of the question

ADD REPLY
6
Entering edit mode
11.6 years ago

Both formats are tab delimited text files used to represent DNA features in genomes. The order of columns between the two are different, there are also columns that correspond to attributes missing from one or the other format. Nonetheless the most important difference between the two is the coordinate systems that they assume.

The BED format developed at UCSC uses a zero based indexing and an open end interval whereas the GFF format developed at Sanger assumes a 1 based coordinate system that includes both start and end coordinates. Therefore

The [0,100] interval in BED format corresponds to [1,100] in GFF format and both are 100 base long. That the first element in BED format will be have the index of 0 where the last 100th element will have the index of 99! Whereas in GFF the first element will have the index of 1 and the last element will have the index of 100.

To convert between the two you may use Galaxy and select the section called Select Formats that will list various transformation options.

ADD COMMENT
8
Entering edit mode
10.9 years ago

You can also convert it from galaxy:

Go to 'Convert formats' and you will find a 'BED-to-GFF converter'.

ADD COMMENT
0
Entering edit mode

Hi Giovanni, Do you know how to convert hdf5 to plink format?

ADD REPLY
0
Entering edit mode

HDF-5 is a generic format for big datasets, which can be used for several applications, from astrophysics to scRNA data. It's conceptually similar to a zip file containing several files and folders. As such, there is not a single way to convert it to plink - it depends on which data is inside the HDF-5 file and in what format.

ADD REPLY
5
Entering edit mode
11.3 years ago

Here's a Perl script I wrote if you wanted to do something local.

There's some code in there for translating yeast chromosome names that can be removed, if not needed. I also used a Site feature in the GFF file as the region ID, which might also need tweaking, depending on what features you're interested in.

#!/usr/bin/perl -w

use strict;
use Bio::Tools::GFF;
use feature qw(say switch);

my $gffio = Bio::Tools::GFF->new(-fh => \*STDIN, -gff_version => 2);
my $feature;

while ($feature = $gffio->next_feature()) {
    # print $gffio->gff_string($feature)."\n";

    # cf. <http://www.sanger.ac.uk/Software/formats/GFF/GFF_Spec.shtml>
    my $seq_id = $feature->seq_id();   
    my $start = $feature->start() - 1;
    my $end = $feature->end();
    my $strand = $feature->strand();
    my @sites = $feature->get_tag_values('Site');

    # translate strand
    given ( $strand ) {
        when ($_ == 1)  { $strand = "+"; }
        when ($_ == -1) { $strand = "-"; }
    }

    # translate yeast chromosome to UCSC browser-readable chromosome
    # cf. <http://www.yeastgenome.org/sgdpub/Saccharomyces_cerevisiae.pdf>
    given ( $seq_id ) {
        when ( $_ eq "I" )    { $seq_id = "chr1"; }
        when ( $_ eq "II" )   { $seq_id = "chr2"; }
        when ( $_ eq "III" )  { $seq_id = "chr3"; }
        when ( $_ eq "IV" )   { $seq_id = "chr4"; }
        when ( $_ eq "V" )    { $seq_id = "chr5"; }
        when ( $_ eq "VI" )   { $seq_id = "chr6"; }
        when ( $_ eq "VII" )  { $seq_id = "chr7"; }
        when ( $_ eq "VIII" ) { $seq_id = "chr8"; }
        when ( $_ eq "IX" )   { $seq_id = "chr9"; }
        when ( $_ eq "X" )    { $seq_id = "chr10"; }
        when ( $_ eq "XI" )   { $seq_id = "chr11"; }
        when ( $_ eq "XII" )  { $seq_id = "chr12"; }
        when ( $_ eq "XIII" ) { $seq_id = "chr13"; }
        when ( $_ eq "XIV" )  { $seq_id = "chr14"; }
        when ( $_ eq "XV" )   { $seq_id = "chr15"; }
        when ( $_ eq "XVI" )  { $seq_id = "chr16"; }
        default { }
    }

    # output
    print "$seq_id\t$start\t$end\t$sites[0]\t0.0\t$strand\n";
}
$gffio->close();

To use it:

gff2bed.pl < data.gff > data.bed
ADD COMMENT
0
Entering edit mode

Just a note: code above need bioperl

ADD REPLY
0
Entering edit mode

Hi Alex, one thing you could do it replace the case block with a hash map that remaps chromosomes. That way it is a lot easier to add other entries withouth make the code longer and longer...

ADD REPLY
0
Entering edit mode

Definitely. At least it might serve as a stepping point for further modifications or tweaks.

ADD REPLY
0
Entering edit mode

Excellent, it worked well for me (the online tool above from galaxy did not), thanks a lot!

ADD REPLY
0
Entering edit mode

There is also galaxy which offers the solutin highlighted below then you can also take a look at this link for the python script which can perform the same trick. Also take a look at this link

ADD REPLY

Login before adding your answer.

Traffic: 2479 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6