How to download Refseq of bacteria as a COMPLETE genome format
1
0
Entering edit mode
5.6 years ago
Shelle ▴ 30

Can anyone tell me how i can download the complete genome format of bacteria Refseq from NCBI ? I mention complete genome as i don't want any word of chromosome be in the FASTA files. I just saw some post regarding this matter and it seems with the name of organism in the format of text file will do the job. I can get the CSV file of the organism from this link " https://www.ncbi.nlm.nih.gov/genome/browse/#!/overview/ " and extract only the first column. If I use the script from this thread How to download COMPLETE bacterial genomes from NCBI based on list of names?, it seems it is not working and nothing will be downloaded. Can anyone tell me if this code is compatible to every format of species.txt or should i reformat it somehow?

cat species.txt
"'Brassica napus' phytoplasma"
"'Candidatus Kapabacteria' thiocyanatum"
"'Chrysanthemum coronarium' phytoplasma"
"'Echinacea purpurea' witches'-broom phytoplasma"
"'Osedax' symbiont bacterium Rs2_46_30_T18"
"'Sphingomonas ginsengisoli' Hoang et al. 2012"
"Abaca bunchy top virus"
"Abalone herpesvirus Victoria/AUS/2009"
"Abalone shriveling syndrome-associated virus"
"Abditibacterium utsteinense"
"Abelson murine leukemia virus"
"Abeoforma whisleri"
"Abiotrophia"
"Abiotrophia defectiva"
"Abisko virus"
"Absidia glauca"
"Absidia repens"
"Absiella dolichum"
"Abutilon Brazil virus"
"Abutilon golden mosaic virus"
"Abutilon mosaic Bolivia virus"
"Abutilon mosaic Brazil virus"
"Abutilon mosaic virus"



wget ftp://ftp.ncbi.nlm.nih.gov/genomes/genbank/bacteria/assembly_summary.txt

IFS=$'\n'; for next in $(cat species.txt); do awk -v SPECIES=^"$next" 'BEGIN{FS="\t"}{if($8 ~ SPECIES && $12=="Complete Genome"){print $20}}' assembly_summary.txt \
    | awk 'BEGIN{OFS=FS="/"}{print "wget "$0,$NF"_genomic.fna.gz"}'; done \
    | sh
genome sequencing FASTA Refseq • 1.5k views
ADD COMMENT
0
Entering edit mode

Hello Shelle,

None of your previous posts have gotten to closure. Please provide feedback and accept answers where appropriate.

If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one answer if they all work.

Upvote|Bookmark|Accept

ADD REPLY
3
Entering edit mode
5.6 years ago
GenoMax 141k

You have to modify @5heikki's answer in the linked thread so it fits your use case. Try the following:

IFS=$'\n'; awk 'BEGIN{FS="\t"}{if($12=="Complete Genome"){print $20}}' assembly_summary.txt | awk 'BEGIN{OFS=FS="/"}{print "wget "$0,$NF"_genomic.fna.gz"}' | sh

No species.txt file is needed in your case.

There is no complete genome format. We are downloading only those genomes that have been marked as complete in the relevant column in assembly_summary.txt file.

ADD COMMENT

Login before adding your answer.

Traffic: 2279 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6