Question: Get reference full genome sequences for selected organisms
0
gravatar for marongiu.luigi
3.2 years ago by
Germany, Mannheim, UMM
marongiu.luigi520 wrote:

Hello,

I would like to download all the reference sequences, full length for a given organism. I am using esearch as reported on the NCBIwebsitee with the following command:

esearch -db "nucleotide" -query "txidX[Organism] AND refseq[filter]"|efetch -format fasta > genome.fa

where X is the code for a given taxon. This works but I get both 'complete genome' and 'complete sequence' entries.

Is it possible to get only the 'complete genome' entries? Thank you

blast genome • 1.1k views
ADD COMMENTlink modified 3.2 years ago by Joseph Hughes2.9k • written 3.2 years ago by marongiu.luigi520
2
gravatar for Joseph Hughes
3.2 years ago by
Joseph Hughes2.9k
Scotland, UK
Joseph Hughes2.9k wrote:

This is most likely a result of your particular species having multiple segments or chromosomes. For example:

esearch -db "nucleotide" -query "txid40120[Organism] AND refseq[filter]"|efetch -format fasta > genome.fa

would retrieve 32 complete genomes but

esearch -db "nucleotide" -query "txid4txid40051[Organism] AND refseq[filter]"|efetch -format fasta > genome.fa

would retrieve 10 complete sequences, one for each of the 10 segments of the bluetongue virus.

So the approach to take depends on what you really want to retrieve.

ADD COMMENTlink written 3.2 years ago by Joseph Hughes2.9k

thank you, but the taxon I am looking for contains both complete genomes and sequences; still is there a way to separate them, either directly with an option of esearch or afterward with the manipulation of the resulting fasta file?

ADD REPLYlink written 3.2 years ago by marongiu.luigi520
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1762 users visited in the last hour
_