Question: Get reference full genome sequences for selected organisms
0
gravatar for marongiu.luigi
3 months ago by
Germany, Mannheim, UMM
marongiu.luigi140 wrote:

Hello,

I would like to download all the reference sequences, full length for a given organism. I am using esearch as reported on the NCBIwebsitee with the following command:

esearch -db "nucleotide" -query "txidX[Organism] AND refseq[filter]"|efetch -format fasta > genome.fa

where X is the code for a given taxon. This works but I get both 'complete genome' and 'complete sequence' entries.

Is it possible to get only the 'complete genome' entries? Thank you

blast genome • 210 views
ADD COMMENTlink modified 3 months ago by Joseph Hughes2.5k • written 3 months ago by marongiu.luigi140
2
gravatar for Joseph Hughes
3 months ago by
Joseph Hughes2.5k
Scotland, UK
Joseph Hughes2.5k wrote:

This is most likely a result of your particular species having multiple segments or chromosomes. For example:

esearch -db "nucleotide" -query "txid40120[Organism] AND refseq[filter]"|efetch -format fasta > genome.fa

would retrieve 32 complete genomes but

esearch -db "nucleotide" -query "txid4txid40051[Organism] AND refseq[filter]"|efetch -format fasta > genome.fa

would retrieve 10 complete sequences, one for each of the 10 segments of the bluetongue virus.

So the approach to take depends on what you really want to retrieve.

ADD COMMENTlink written 3 months ago by Joseph Hughes2.5k

thank you, but the taxon I am looking for contains both complete genomes and sequences; still is there a way to separate them, either directly with an option of esearch or afterward with the manipulation of the resulting fasta file?

ADD REPLYlink written 3 months ago by marongiu.luigi140
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 754 users visited in the last hour