Question: create GFF file for fasta reference file
0
gravatar for marongiu.luigi
7 days ago by
Germany, Mannheim, UMM
marongiu.luigi120 wrote:

hello,

i am downloading the reference sequence for all the species within a given taxon with the command:

esearch -db "nucleotide" -query "txidX[Organism] AND refseq[filter]"|efetch -format fasta > genome.fa

this is supposed to be a chromosome to align my reading against. But how can I visualize the alignment against this reference for instance with IGB if I don't have the annotations files such as GFF? Essentially, the question is: is it possible to create an annotation file from a multifasta file?

Thank you

rna-seq sequence genome • 105 views
ADD COMMENTlink written 7 days ago by marongiu.luigi120

Can you post an example accession number? NCBI has annotation files available for many of the refseq genomes.

ADD REPLYlink written 7 days ago by genomax40k

I have for instance all the viruses with

esearch -db "nucleotide" -query "txid10239[Organism] AND refseq[filter]"|efetch -format fasta > virus_genome.fa
ADD REPLYlink written 7 days ago by marongiu.luigi120
1

I was going to suggest the same thing as @Sej. Take a look at this page for utilities to download and convert the genbank data. Also this: Genbank To Gtf Converter

ADD REPLYlink written 7 days ago by genomax40k
1
gravatar for Sej Modha
7 days ago by
Sej Modha2.4k
Glasgow, UK
Sej Modha2.4k wrote:

You can download the corresponding GenBank file using the following command and then convert GenBank file to gff3 for the species of interest using utilities (e.g. Readseq) described here: Any tools converting Genbank format to GFF3 format?

esearch -db "nucleotide" -query "txidX[Organism] AND refseq[filter]"|efetch -format gb > genome.gb

is it possible to create an annotation file from a multifasta file?

No, because a fasta file does not have enough information such as annotation details to generate a GFF file.

ADD COMMENTlink modified 7 days ago • written 7 days ago by Sej Modha2.4k

OK, so it is not possible from a fasta file but it is possible from a GB file. Fair enough, thank you! For matter of completion, I could not produce a GB file for the taxon 10239: when i ran the command

esearch -db "nucleotide" -query "txid10239[Organism] AND refseq[filter]"|efetch -format gb > genome.gb

I got:

500 read timeout
No do_post output returned from 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=nucleotide&term=txid10239%5BOrganism%5D%20AND%20refseq%5Bfilter%5D&retmax=0&usehistory=y&edirect=7.00&tool=edirect&email=gigiux@DirtyHarry'
Result of do_post http request is
$VAR1 = bless( {
                 '_headers' => bless( {
                                        'client-date' => 'Fri, 12 Jan 2018 13:48:23 GMT',
                                        'client-warning' => 'Internal response',
                                        'content-type' => 'text/plain',
                                        '::std_case' => {
                                                          'client-warning' => 'Client-Warning',
                                                          'client-date' => 'Client-Date'
                                                        }
                                      }, 'HTTP::Headers' ),
                 '_rc' => 500,
                 '_content' => 'read timeout at /home/gigiux/miniconda3/bin/aux/lib/perl5/Net/HTTP/Methods.pm line 268.
',
                 '_request' => bless( {
                                        '_uri' => bless( do{\(my $o = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi')}, 'URI::https' ),
                                        '_headers' => bless( {
                                                               'content-type' => 'application/x-www-form-urlencoded',
                                                               '::std_case' => {
                                                                                 'if-ssl-cert-subject' => 'If-SSL-Cert-Subject'
                                                                               },
                                                               'user-agent' => 'libwww-perl/6.26'
                                                             }, 'HTTP::Headers' ),
                                        '_method' => 'POST',
                                        '_content' => 'db=nucleotide&term=txid10239%5BOrganism%5D%20AND%20refseq%5Bfilter%5D&retmax=0&usehistory=y&edirect=7.00&tool=edirect&email=gigiux@DirtyHarry'
                                      }, 'HTTP::Request' ),
                 '_msg' => 'read timeout'
               }, 'HTTP::Response' );

WebEnv value not found in search output - WebEnv1 
Db value not found in fetch input
ADD REPLYlink written 6 days ago by marongiu.luigi120

Unable to replicate this error as it works just fine for me, are you using an updated version of eutils that would work with HTTPS? You can also download all viral refseq in GenBank format from NCBI FTP.

ADD REPLYlink modified 6 days ago • written 6 days ago by Sej Modha2.4k

I am re-running. My esearch/efetch is version 7.00

ADD REPLYlink written 6 days ago by marongiu.luigi120

Don't forget to accept or upvote the answers which were helpful to you ;)

ADD REPLYlink written 6 days ago by Sej Modha2.4k

this time worked fine. This exception then no longer stand.

ADD REPLYlink written 5 days ago by marongiu.luigi120
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 849 users visited in the last hour