How Do I Convert From Bed Format To Gff Format?
3
8
Entering edit mode
12.7 years ago

I have a file in GFF format and I need to convert it to BED format. What do I do?

bed gff galaxy • 26k views
0
Entering edit mode

I'll answer my own question here as it is a demo for now

0
Entering edit mode

someone should write a well-documented, well-tested python module / script to do this! many current converters either discard CDS information or include every CDS or mRNA on its own line. would be nice if it this script had an option to include CDSs on the same line using the extended bad format.

0
Entering edit mode
0
Entering edit mode

Used the perl script submitted from Alex Reynolds and it worked absolutely fine.

0
Entering edit mode

The question here is exactly the opposite of the title of the question

6
Entering edit mode
12.7 years ago

Both formats are tab delimited text files used to represent DNA features in genomes. The order of columns between the two are different, there are also columns that correspond to attributes missing from one or the other format. Nonetheless the most important difference between the two is the coordinate systems that they assume.

The BED format developed at UCSC uses a zero based indexing and an open end interval whereas the GFF format developed at Sanger assumes a 1 based coordinate system that includes both start and end coordinates. Therefore

The [0,100] interval in BED format corresponds to [1,100] in GFF format and both are 100 base long. That the first element in BED format will be have the index of 0 where the last 100th element will have the index of 99! Whereas in GFF the first element will have the index of 1 and the last element will have the index of 100.

To convert between the two you may use Galaxy and select the section called Select Formats that will list various transformation options.

8
Entering edit mode
12.0 years ago

You can also convert it from galaxy:

Go to 'Convert formats' and you will find a 'BED-to-GFF converter'.

0
Entering edit mode

Hi Giovanni, Do you know how to convert hdf5 to plink format?

0
Entering edit mode

HDF-5 is a generic format for big datasets, which can be used for several applications, from astrophysics to scRNA data. It's conceptually similar to a zip file containing several files and folders. As such, there is not a single way to convert it to plink - it depends on which data is inside the HDF-5 file and in what format.

5
Entering edit mode
12.4 years ago

Here's a Perl script I wrote if you wanted to do something local.

There's some code in there for translating yeast chromosome names that can be removed, if not needed. I also used a Site feature in the GFF file as the region ID, which might also need tweaking, depending on what features you're interested in.

#!/usr/bin/perl -w

use strict;
use Bio::Tools::GFF;
use feature qw(say switch);

my $gffio = Bio::Tools::GFF->new(-fh => \*STDIN, -gff_version => 2); my$feature;

while ($feature =$gffio->next_feature()) {
# print $gffio->gff_string($feature)."\n";

# cf. <http://www.sanger.ac.uk/Software/formats/GFF/GFF_Spec.shtml>
my $seq_id =$feature->seq_id();
my $start =$feature->start() - 1;
my $end =$feature->end();
my $strand =$feature->strand();
my @sites = $feature->get_tag_values('Site'); # translate strand given ($strand ) {
when ($_ == 1) {$strand = "+"; }
when ($_ == -1) {$strand = "-"; }
}

# translate yeast chromosome to UCSC browser-readable chromosome
# cf. <http://www.yeastgenome.org/sgdpub/Saccharomyces_cerevisiae.pdf>
given ( $seq_id ) { when ($_ eq "I" )    { $seq_id = "chr1"; } when ($_ eq "II" )   { $seq_id = "chr2"; } when ($_ eq "III" )  { $seq_id = "chr3"; } when ($_ eq "IV" )   { $seq_id = "chr4"; } when ($_ eq "V" )    { $seq_id = "chr5"; } when ($_ eq "VI" )   { $seq_id = "chr6"; } when ($_ eq "VII" )  { $seq_id = "chr7"; } when ($_ eq "VIII" ) { $seq_id = "chr8"; } when ($_ eq "IX" )   { $seq_id = "chr9"; } when ($_ eq "X" )    { $seq_id = "chr10"; } when ($_ eq "XI" )   { $seq_id = "chr11"; } when ($_ eq "XII" )  { $seq_id = "chr12"; } when ($_ eq "XIII" ) { $seq_id = "chr13"; } when ($_ eq "XIV" )  { $seq_id = "chr14"; } when ($_ eq "XV" )   { $seq_id = "chr15"; } when ($_ eq "XVI" )  { $seq_id = "chr16"; } default { } } # output print "$seq_id\t$start\t$end\t$sites[0]\t0.0\t$strand\n";
}
\$gffio->close();


To use it:

gff2bed.pl < data.gff > data.bed

0
Entering edit mode

Just a note: code above need bioperl

0
Entering edit mode

Hi Alex, one thing you could do it replace the case block with a hash map that remaps chromosomes. That way it is a lot easier to add other entries withouth make the code longer and longer...

0
Entering edit mode

Definitely. At least it might serve as a stepping point for further modifications or tweaks.

0
Entering edit mode

Excellent, it worked well for me (the online tool above from galaxy did not), thanks a lot!

0
Entering edit mode

There is also galaxy which offers the solutin highlighted below then you can also take a look at this link for the python script which can perform the same trick. Also take a look at this link