Question: Converting Gff/Gtf + Reference To Embl Or Genbank ...Any Tools Available?
4.8 years ago
West Lafayette
JacobS wrote:

I need to be able to easily change between GFF/GTF + reference to either EMBL and GenBank formats, and vice versa. Are there are any frequently used tools for accomplishing this, or should I script something myself?

4.8 years ago
Hamish wrote:

The EMBOSS tool seqret would be a possible option. For example:

Generating an EMBL-Bank style entry from a fasta sequence and a GFF feature table:

seqret -sequence aj242600.fasta -feature -fformat gff -fopenfile aj242600.gff -osformat embl -auto

Alternatively to get a GenBank style entry:

seqret -sequence aj242600.fasta -feature -fformat gff -fopenfile aj242600.gff -osformat genbank -auto

To go the other way and get the sequence in fasta format and the features as GFF use something like:

seqret -sformat embl -sequence aj242600.dat -feature -osformat fasta -offormat gff -auto

Please note that since these are starting from sequence plus features they do not create a full EMBL-Bank or GenBank style entry, since this requires additional information, such as references, not available in the source data.

17 months ago
j.dolata wrote:

Hi I would like to extract data in genbank format based on genome fasta file and gff file with coordinates. Could anybody help me?

It would be best to ask this as a separate question.

Bedtools can extract the fasta subsequences

Tool:    bedtools getfasta (aka fastaFromBed)
Version: v2.25.0
Summary: Extract DNA sequences into a fasta file based on feature coordinates.

Usage:   bedtools getfasta [OPTIONS] -fi <fasta> -bed <bed/gff/vcf> -fo <fasta> 

    -fi Input FASTA file
    -bed    BED/GFF/VCF file of ranges to extract from -fi
    -fo Output file (can be FASTA or TAB-delimited)
    -name   Use the name field for the FASTA header
    -split  given BED12 fmt., extract and concatenate the sequencesfrom the BED "blocks" (e.g., exons)
    -tab    Write output in TAB delimited format.
        - Default is FASTA format.

    -s  Force strandedness. If the feature occupies the antisense,
        strand, the sequence will be reverse complemented.
        - By default, strand information is ignored.

    -fullHeader Use full fasta header.
        - By default, only the word before the first space or tab is used.

get bedtools from here

