Question: How To Retrieve A Gff With Sequence Identifier Different Than Gb With Bioperl
0
gravatar for fbrundu
5.4 years ago by
fbrundu260
European Union
fbrundu260 wrote:

Hi all,

I need to retrieve a GFF with a specific accession number. Searching through a file in FASTA format I have:

>gi|345090966|ref|NG_029839.1|:195425-195447 Homo sapiens c-Maf inducing protein (CMIP), RefSeqGene on chromosome 16
TGCAAAAGTAATTGCAGTTTTTG
>gi|343168829|gb|AC245437.1|:21551-21573 Homo sapiens FOSMID clone ABC14-947514C10 from chromosome unknown, complete sequence
CAAAAACTGCAATTACTTTTGCA
>gi|340523118|ref|NG_029471.1|:35678-35700 Homo sapiens hemopoietic cell kinase (HCK), RefSeqGene on chromosome 20
CAAAAACTGCAATTACTTTTGCA

I am able to retrieve the GFF related to lines with gb as sequence identifier, such as:

>gi|343168829|gb|AC245437.1|:21551-21573 Homo sapiens FOSMID clone ABC14-947514C10 from chromosome unknown, complete sequence
CAAAAACTGCAATTACTTTTGCA

with a BioPerl script, in this way:

./bp_genbank2gff.pl -accession AC245437.1 -stdout > AC245437.1.gff

I am unable to get gff with different sequence identifier. What am I doing wrong?

Thanks

gff3 gff bioperl • 2.1k views
ADD COMMENTlink modified 5.4 years ago by Daniel Standage3.8k • written 5.4 years ago by fbrundu260
2
gravatar for Daniel Standage
5.4 years ago by
Daniel Standage3.8k
Davis, California, USA
Daniel Standage3.8k wrote:

The gb symbol stands for GenBank, while the ref symbol stands for RefSeq. From the script's usage statement, it appears only GenBank accessions are supported for remote download.

[standage@lappy ~] bp_genbank2gff.pl 

Usage: bp_genbank2gff.pl [options] [<gff file 1> <gff file 2>] ...
Load a Bio::DB::GFF database from GFF files.

 Options:
   --create                 Force creation and initialization of database
   --dsn       <dsn>        Data source (default dbi:mysql:test)
   --user      <user>       Username for mysql authentication
   --pass      <password>   Password for mysql authentication
   --proxy     <proxy>      Proxy server to use for remote access
   --stdout                 direct output to STDOUT
   --adaptor   <adaptor>    adaptor to use (eg dbi::mysql, dbi::pg, dbi::oracle)
   --viral                  the genome you are loading is viral (changes tag
                                 choices)
   --source    <source>     source field for features ['genbank']
    EITHER --file           Arguments that follow are Genbank/EMBL file names
    OR --gb_folder          What follows is a folder full of gb files to process
    OR --accession          Arguments that follow are genbank accession numbers
                                 (not gi!)
    OR --acc_file           Accession numbers (not gi!) in a file (one per line,
                                 no punc.) 
    OR --acc_pipe           Accession numbers (not gi!) from a STDIN pipe (one
                                 per line)   


This script loads a Bio::DB::GFF database with the features contained
in a either a local genbank file or an accession that is fetched from
genbank.  Various command-line options allow you to control which
database to load and whether to allow an existing database to be
overwritten.

[standage@lappy ~]

However, if you have a GenBank formatted file downloaded to your local machine, you can use this script to extract all of the features in that file and convert to GFF3 format.

Unfortunately, your question does not make it very clear what information you need. I ran the example you gave, and the GFF3 that was generated contained only two (redundant and uninformative) features and the fosmid sequence. Perhaps it would make it easier for us to help you if you could edit your question to make it clearer what exactly you're looking for.

ADD COMMENTlink written 5.4 years ago by Daniel Standage3.8k

Thanks for your effort. I am a little confused.. Is it possible to extract more features that in this case aren't displayed? I have to extract all of them. Also, I need to retrieve informations from RefSeq too; do you know how to do it. Thanks again

ADD REPLYlink written 5.4 years ago by fbrundu260

I have done some research and I found this http://gmod.org/wiki/Load_RefSeq_Into_Chado that explains how to do the conversion.. Anyway with this script, as you pointed out, there's no chance to download from RefSeq accession number.

ADD REPLYlink written 5.4 years ago by fbrundu260
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1732 users visited in the last hour