Question: Converting Gff/Gtf + Reference To Embl Or Genbank ...Any Tools Available?
gravatar for JacobS
6.8 years ago by
Cleveland, Ohio
JacobS900 wrote:

I need to be able to easily change between GFF/GTF + reference to either EMBL and GenBank formats, and vice versa. Are there are any frequently used tools for accomplishing this, or should I script something myself?

gff genbank • 17k views
ADD COMMENTlink modified 3.4 years ago by j.dolata0 • written 6.8 years ago by JacobS900
gravatar for Hamish
6.8 years ago by
Hamish3.1k wrote:

The EMBOSS tool seqret would be a possible option. For example:

Generating an EMBL-Bank style entry from a fasta sequence and a GFF feature table:

seqret -sequence aj242600.fasta -feature -fformat gff -fopenfile aj242600.gff -osformat embl -auto

Alternatively to get a GenBank style entry:

seqret -sequence aj242600.fasta -feature -fformat gff -fopenfile aj242600.gff -osformat genbank -auto

To go the other way and get the sequence in fasta format and the features as GFF use something like:

seqret -sformat embl -sequence aj242600.dat -feature -osformat fasta -offormat gff -auto

Please note that since these are starting from sequence plus features they do not create a full EMBL-Bank or GenBank style entry, since this requires additional information, such as references, not available in the source data.

ADD COMMENTlink modified 6.8 years ago • written 6.8 years ago by Hamish3.1k
gravatar for j.dolata
3.4 years ago by
j.dolata0 wrote:

Hi I would like to extract data in genbank format based on genome fasta file and gff file with coordinates. Could anybody help me?

ADD COMMENTlink written 3.4 years ago by j.dolata0

It would be best to ask this as a separate question.

ADD REPLYlink written 3.4 years ago by WouterDeCoster43k

Bedtools can extract the fasta subsequences

Tool:    bedtools getfasta (aka fastaFromBed)
Version: v2.25.0
Summary: Extract DNA sequences into a fasta file based on feature coordinates.

Usage:   bedtools getfasta [OPTIONS] -fi <fasta> -bed <bed/gff/vcf> -fo <fasta> 

    -fi Input FASTA file
    -bed    BED/GFF/VCF file of ranges to extract from -fi
    -fo Output file (can be FASTA or TAB-delimited)
    -name   Use the name field for the FASTA header
    -split  given BED12 fmt., extract and concatenate the sequencesfrom the BED "blocks" (e.g., exons)
    -tab    Write output in TAB delimited format.
        - Default is FASTA format.

    -s  Force strandedness. If the feature occupies the antisense,
        strand, the sequence will be reverse complemented.
        - By default, strand information is ignored.

    -fullHeader Use full fasta header.
        - By default, only the word before the first space or tab is used.

get bedtools from here

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by Stephane Plaisance410
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1533 users visited in the last hour