Converting Gff/Gtf + Reference To Embl Or Genbank ...Any Tools Available?
2
4
Entering edit mode
8.4 years ago
JacobS ▴ 940

I need to be able to easily change between GFF/GTF + reference to either EMBL and GenBank formats, and vice versa. Are there are any frequently used tools for accomplishing this, or should I script something myself?

gff genbank • 19k views
ADD COMMENT
21
Entering edit mode
8.4 years ago
Hamish ★ 3.2k

The EMBOSS tool seqret would be a possible option. For example:

Generating an EMBL-Bank style entry from a fasta sequence and a GFF feature table:

seqret -sequence aj242600.fasta -feature -fformat gff -fopenfile aj242600.gff -osformat embl -auto

Alternatively to get a GenBank style entry:

seqret -sequence aj242600.fasta -feature -fformat gff -fopenfile aj242600.gff -osformat genbank -auto

To go the other way and get the sequence in fasta format and the features as GFF use something like:

seqret -sformat embl -sequence aj242600.dat -feature -osformat fasta -offormat gff -auto

Please note that since these are starting from sequence plus features they do not create a full EMBL-Bank or GenBank style entry, since this requires additional information, such as references, not available in the source data.

ADD COMMENT
0
Entering edit mode
5.0 years ago
j.dolata • 0

Hi I would like to extract data in genbank format based on genome fasta file and gff file with coordinates. Could anybody help me?

ADD COMMENT
0
Entering edit mode

It would be best to ask this as a separate question.

ADD REPLY
0
Entering edit mode

Bedtools can extract the fasta subsequences

Tool:    bedtools getfasta (aka fastaFromBed)
Version: v2.25.0
Summary: Extract DNA sequences into a fasta file based on feature coordinates.

Usage:   bedtools getfasta [OPTIONS] -fi <fasta> -bed <bed/gff/vcf> -fo <fasta> 

Options: 
    -fi Input FASTA file
    -bed    BED/GFF/VCF file of ranges to extract from -fi
    -fo Output file (can be FASTA or TAB-delimited)
    -name   Use the name field for the FASTA header
    -split  given BED12 fmt., extract and concatenate the sequencesfrom the BED "blocks" (e.g., exons)
    -tab    Write output in TAB delimited format.
        - Default is FASTA format.

    -s  Force strandedness. If the feature occupies the antisense,
        strand, the sequence will be reverse complemented.
        - By default, strand information is ignored.

    -fullHeader Use full fasta header.
        - By default, only the word before the first space or tab is used.

get bedtools from here

ADD REPLY

Login before adding your answer.

Traffic: 1705 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6