Question: Get reference full genome sequences for selected organisms
0
gravatar for marongiu.luigi
14 months ago by
Germany, Mannheim, UMM
marongiu.luigi380 wrote:

Hello,

I would like to download all the reference sequences, full length for a given organism. I am using esearch as reported on the NCBIwebsitee with the following command:

esearch -db "nucleotide" -query "txidX[Organism] AND refseq[filter]"|efetch -format fasta > genome.fa

where X is the code for a given taxon. This works but I get both 'complete genome' and 'complete sequence' entries.

Is it possible to get only the 'complete genome' entries? Thank you

blast genome • 495 views
ADD COMMENTlink modified 14 months ago by Joseph Hughes2.7k • written 14 months ago by marongiu.luigi380
2
gravatar for Joseph Hughes
14 months ago by
Joseph Hughes2.7k
Scotland, UK
Joseph Hughes2.7k wrote:

This is most likely a result of your particular species having multiple segments or chromosomes. For example:

esearch -db "nucleotide" -query "txid40120[Organism] AND refseq[filter]"|efetch -format fasta > genome.fa

would retrieve 32 complete genomes but

esearch -db "nucleotide" -query "txid4txid40051[Organism] AND refseq[filter]"|efetch -format fasta > genome.fa

would retrieve 10 complete sequences, one for each of the 10 segments of the bluetongue virus.

So the approach to take depends on what you really want to retrieve.

ADD COMMENTlink written 14 months ago by Joseph Hughes2.7k

thank you, but the taxon I am looking for contains both complete genomes and sequences; still is there a way to separate them, either directly with an option of esearch or afterward with the manipulation of the resulting fasta file?

ADD REPLYlink written 14 months ago by marongiu.luigi380
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1102 users visited in the last hour