create GFF file for fasta reference file
1
0
Entering edit mode
6.3 years ago

hello,

i am downloading the reference sequence for all the species within a given taxon with the command:

esearch -db "nucleotide" -query "txidX[Organism] AND refseq[filter]"|efetch -format fasta > genome.fa

this is supposed to be a chromosome to align my reading against. But how can I visualize the alignment against this reference for instance with IGB if I don't have the annotations files such as GFF? Essentially, the question is: is it possible to create an annotation file from a multifasta file?

Thank you

genome sequence RNA-Seq • 4.7k views
ADD COMMENT
0
Entering edit mode

Can you post an example accession number? NCBI has annotation files available for many of the refseq genomes.

ADD REPLY
0
Entering edit mode

I have for instance all the viruses with

esearch -db "nucleotide" -query "txid10239[Organism] AND refseq[filter]"|efetch -format fasta > virus_genome.fa
ADD REPLY
1
Entering edit mode

I was going to suggest the same thing as @Sej. Take a look at this page for utilities to download and convert the genbank data. Also this: Genbank To Gtf Converter

ADD REPLY
1
Entering edit mode
6.3 years ago
Sej Modha 5.3k

You can download the corresponding GenBank file using the following command and then convert GenBank file to gff3 for the species of interest using utilities (e.g. Readseq) described here: Any tools converting Genbank format to GFF3 format?

esearch -db "nucleotide" -query "txidX[Organism] AND refseq[filter]"|efetch -format gb > genome.gb

is it possible to create an annotation file from a multifasta file?

No, because a fasta file does not have enough information such as annotation details to generate a GFF file.

ADD COMMENT
0
Entering edit mode

OK, so it is not possible from a fasta file but it is possible from a GB file. Fair enough, thank you! For matter of completion, I could not produce a GB file for the taxon 10239: when i ran the command

esearch -db "nucleotide" -query "txid10239[Organism] AND refseq[filter]"|efetch -format gb > genome.gb

I got:

500 read timeout
No do_post output returned from 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=nucleotide&term=txid10239%5BOrganism%5D%20AND%20refseq%5Bfilter%5D&retmax=0&usehistory=y&edirect=7.00&tool=edirect&email=gigiux@DirtyHarry'
Result of do_post http request is
$VAR1 = bless( {
                 '_headers' => bless( {
                                        'client-date' => 'Fri, 12 Jan 2018 13:48:23 GMT',
                                        'client-warning' => 'Internal response',
                                        'content-type' => 'text/plain',
                                        '::std_case' => {
                                                          'client-warning' => 'Client-Warning',
                                                          'client-date' => 'Client-Date'
                                                        }
                                      }, 'HTTP::Headers' ),
                 '_rc' => 500,
                 '_content' => 'read timeout at /home/gigiux/miniconda3/bin/aux/lib/perl5/Net/HTTP/Methods.pm line 268.
',
                 '_request' => bless( {
                                        '_uri' => bless( do{\(my $o = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi')}, 'URI::https' ),
                                        '_headers' => bless( {
                                                               'content-type' => 'application/x-www-form-urlencoded',
                                                               '::std_case' => {
                                                                                 'if-ssl-cert-subject' => 'If-SSL-Cert-Subject'
                                                                               },
                                                               'user-agent' => 'libwww-perl/6.26'
                                                             }, 'HTTP::Headers' ),
                                        '_method' => 'POST',
                                        '_content' => 'db=nucleotide&term=txid10239%5BOrganism%5D%20AND%20refseq%5Bfilter%5D&retmax=0&usehistory=y&edirect=7.00&tool=edirect&email=gigiux@DirtyHarry'
                                      }, 'HTTP::Request' ),
                 '_msg' => 'read timeout'
               }, 'HTTP::Response' );

WebEnv value not found in search output - WebEnv1 
Db value not found in fetch input
ADD REPLY
0
Entering edit mode

Unable to replicate this error as it works just fine for me, are you using an updated version of eutils that would work with HTTPS? You can also download all viral refseq in GenBank format from NCBI FTP.

ADD REPLY
0
Entering edit mode

I am re-running. My esearch/efetch is version 7.00

ADD REPLY
0
Entering edit mode

Don't forget to accept or upvote the answers which were helpful to you ;)

ADD REPLY
0
Entering edit mode

this time worked fine. This exception then no longer stand.

ADD REPLY

Login before adding your answer.

Traffic: 4031 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6