Question: create GFF file for fasta reference file
0
gravatar for marongiu.luigi
6 months ago by
Germany, Mannheim, UMM
marongiu.luigi210 wrote:

hello,

i am downloading the reference sequence for all the species within a given taxon with the command:

esearch -db "nucleotide" -query "txidX[Organism] AND refseq[filter]"|efetch -format fasta > genome.fa

this is supposed to be a chromosome to align my reading against. But how can I visualize the alignment against this reference for instance with IGB if I don't have the annotations files such as GFF? Essentially, the question is: is it possible to create an annotation file from a multifasta file?

Thank you

rna-seq sequence genome • 524 views
ADD COMMENTlink written 6 months ago by marongiu.luigi210

Can you post an example accession number? NCBI has annotation files available for many of the refseq genomes.

ADD REPLYlink written 6 months ago by genomax52k

I have for instance all the viruses with

esearch -db "nucleotide" -query "txid10239[Organism] AND refseq[filter]"|efetch -format fasta > virus_genome.fa
ADD REPLYlink written 6 months ago by marongiu.luigi210
1

I was going to suggest the same thing as @Sej. Take a look at this page for utilities to download and convert the genbank data. Also this: Genbank To Gtf Converter

ADD REPLYlink written 6 months ago by genomax52k
1
gravatar for Sej Modha
6 months ago by
Sej Modha3.1k
Glasgow, UK
Sej Modha3.1k wrote:

You can download the corresponding GenBank file using the following command and then convert GenBank file to gff3 for the species of interest using utilities (e.g. Readseq) described here: Any tools converting Genbank format to GFF3 format?

esearch -db "nucleotide" -query "txidX[Organism] AND refseq[filter]"|efetch -format gb > genome.gb

is it possible to create an annotation file from a multifasta file?

No, because a fasta file does not have enough information such as annotation details to generate a GFF file.

ADD COMMENTlink modified 6 months ago • written 6 months ago by Sej Modha3.1k

OK, so it is not possible from a fasta file but it is possible from a GB file. Fair enough, thank you! For matter of completion, I could not produce a GB file for the taxon 10239: when i ran the command

esearch -db "nucleotide" -query "txid10239[Organism] AND refseq[filter]"|efetch -format gb > genome.gb

I got:

500 read timeout
No do_post output returned from 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=nucleotide&term=txid10239%5BOrganism%5D%20AND%20refseq%5Bfilter%5D&retmax=0&usehistory=y&edirect=7.00&tool=edirect&email=gigiux@DirtyHarry'
Result of do_post http request is
$VAR1 = bless( {
                 '_headers' => bless( {
                                        'client-date' => 'Fri, 12 Jan 2018 13:48:23 GMT',
                                        'client-warning' => 'Internal response',
                                        'content-type' => 'text/plain',
                                        '::std_case' => {
                                                          'client-warning' => 'Client-Warning',
                                                          'client-date' => 'Client-Date'
                                                        }
                                      }, 'HTTP::Headers' ),
                 '_rc' => 500,
                 '_content' => 'read timeout at /home/gigiux/miniconda3/bin/aux/lib/perl5/Net/HTTP/Methods.pm line 268.
',
                 '_request' => bless( {
                                        '_uri' => bless( do{\(my $o = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi')}, 'URI::https' ),
                                        '_headers' => bless( {
                                                               'content-type' => 'application/x-www-form-urlencoded',
                                                               '::std_case' => {
                                                                                 'if-ssl-cert-subject' => 'If-SSL-Cert-Subject'
                                                                               },
                                                               'user-agent' => 'libwww-perl/6.26'
                                                             }, 'HTTP::Headers' ),
                                        '_method' => 'POST',
                                        '_content' => 'db=nucleotide&term=txid10239%5BOrganism%5D%20AND%20refseq%5Bfilter%5D&retmax=0&usehistory=y&edirect=7.00&tool=edirect&email=gigiux@DirtyHarry'
                                      }, 'HTTP::Request' ),
                 '_msg' => 'read timeout'
               }, 'HTTP::Response' );

WebEnv value not found in search output - WebEnv1 
Db value not found in fetch input
ADD REPLYlink written 6 months ago by marongiu.luigi210

Unable to replicate this error as it works just fine for me, are you using an updated version of eutils that would work with HTTPS? You can also download all viral refseq in GenBank format from NCBI FTP.

ADD REPLYlink modified 6 months ago • written 6 months ago by Sej Modha3.1k

I am re-running. My esearch/efetch is version 7.00

ADD REPLYlink written 6 months ago by marongiu.luigi210

Don't forget to accept or upvote the answers which were helpful to you ;)

ADD REPLYlink written 6 months ago by Sej Modha3.1k

this time worked fine. This exception then no longer stand.

ADD REPLYlink written 6 months ago by marongiu.luigi210
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 473 users visited in the last hour