Question: How to download Refseq of bacteria as a COMPLETE genome format
gravatar for Shelle
7 months ago by
Shelle0 wrote:

Can anyone tell me how i can download the complete genome format of bacteria Refseq from NCBI ? I mention complete genome as i don't want any word of chromosome be in the FASTA files. I just saw some post regarding this matter and it seems with the name of organism in the format of text file will do the job. I can get the CSV file of the organism from this link "!/overview/ " and extract only the first column. If I use the script from this thread How to download COMPLETE bacterial genomes from NCBI based on list of names?, it seems it is not working and nothing will be downloaded. Can anyone tell me if this code is compatible to every format of species.txt or should i reformat it somehow?

cat species.txt
"'Brassica napus' phytoplasma"
"'Candidatus Kapabacteria' thiocyanatum"
"'Chrysanthemum coronarium' phytoplasma"
"'Echinacea purpurea' witches'-broom phytoplasma"
"'Osedax' symbiont bacterium Rs2_46_30_T18"
"'Sphingomonas ginsengisoli' Hoang et al. 2012"
"Abaca bunchy top virus"
"Abalone herpesvirus Victoria/AUS/2009"
"Abalone shriveling syndrome-associated virus"
"Abditibacterium utsteinense"
"Abelson murine leukemia virus"
"Abeoforma whisleri"
"Abiotrophia defectiva"
"Abisko virus"
"Absidia glauca"
"Absidia repens"
"Absiella dolichum"
"Abutilon Brazil virus"
"Abutilon golden mosaic virus"
"Abutilon mosaic Bolivia virus"
"Abutilon mosaic Brazil virus"
"Abutilon mosaic virus"


IFS=$'\n'; for next in $(cat species.txt); do awk -v SPECIES=^"$next" 'BEGIN{FS="\t"}{if($8 ~ SPECIES && $12=="Complete Genome"){print $20}}' assembly_summary.txt \
    | awk 'BEGIN{OFS=FS="/"}{print "wget "$0,$NF"_genomic.fna.gz"}'; done \
    | sh
sequencing refseq fasta genome • 293 views
ADD COMMENTlink modified 7 months ago by genomax65k • written 7 months ago by Shelle0

Hello Shelle,

None of your previous posts have gotten to closure. Please provide feedback and accept answers where appropriate.

If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one answer if they all work.


ADD REPLYlink modified 7 months ago • written 7 months ago by RamRS21k
gravatar for genomax
7 months ago by
United States
genomax65k wrote:

You have to modify @5heikki's answer in the linked thread so it fits your use case. Try the following:

IFS=$'\n'; awk 'BEGIN{FS="\t"}{if($12=="Complete Genome"){print $20}}' assembly_summary.txt | awk 'BEGIN{OFS=FS="/"}{print "wget "$0,$NF"_genomic.fna.gz"}' | sh

No species.txt file is needed in your case.

There is no complete genome format. We are downloading only those genomes that have been marked as complete in the relevant column in assembly_summary.txt file.

ADD COMMENTlink modified 7 months ago • written 7 months ago by genomax65k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1506 users visited in the last hour