Question: How to download Refseq of bacteria as a COMPLETE genome format
0
gravatar for Shelle
12 months ago by
Shelle0
Shelle0 wrote:

Can anyone tell me how i can download the complete genome format of bacteria Refseq from NCBI ? I mention complete genome as i don't want any word of chromosome be in the FASTA files. I just saw some post regarding this matter and it seems with the name of organism in the format of text file will do the job. I can get the CSV file of the organism from this link " https://www.ncbi.nlm.nih.gov/genome/browse/#!/overview/ " and extract only the first column. If I use the script from this thread How to download COMPLETE bacterial genomes from NCBI based on list of names?, it seems it is not working and nothing will be downloaded. Can anyone tell me if this code is compatible to every format of species.txt or should i reformat it somehow?

cat species.txt
"'Brassica napus' phytoplasma"
"'Candidatus Kapabacteria' thiocyanatum"
"'Chrysanthemum coronarium' phytoplasma"
"'Echinacea purpurea' witches'-broom phytoplasma"
"'Osedax' symbiont bacterium Rs2_46_30_T18"
"'Sphingomonas ginsengisoli' Hoang et al. 2012"
"Abaca bunchy top virus"
"Abalone herpesvirus Victoria/AUS/2009"
"Abalone shriveling syndrome-associated virus"
"Abditibacterium utsteinense"
"Abelson murine leukemia virus"
"Abeoforma whisleri"
"Abiotrophia"
"Abiotrophia defectiva"
"Abisko virus"
"Absidia glauca"
"Absidia repens"
"Absiella dolichum"
"Abutilon Brazil virus"
"Abutilon golden mosaic virus"
"Abutilon mosaic Bolivia virus"
"Abutilon mosaic Brazil virus"
"Abutilon mosaic virus"



wget ftp://ftp.ncbi.nlm.nih.gov/genomes/genbank/bacteria/assembly_summary.txt

IFS=$'\n'; for next in $(cat species.txt); do awk -v SPECIES=^"$next" 'BEGIN{FS="\t"}{if($8 ~ SPECIES && $12=="Complete Genome"){print $20}}' assembly_summary.txt \
    | awk 'BEGIN{OFS=FS="/"}{print "wget "$0,$NF"_genomic.fna.gz"}'; done \
    | sh
sequencing refseq fasta genome • 358 views
ADD COMMENTlink modified 12 months ago by genomax71k • written 12 months ago by Shelle0

Hello Shelle,

None of your previous posts have gotten to closure. Please provide feedback and accept answers where appropriate.

If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one answer if they all work.

Upvote|Bookmark|Accept

ADD REPLYlink modified 12 months ago • written 12 months ago by RamRS24k
3
gravatar for genomax
12 months ago by
genomax71k
United States
genomax71k wrote:

You have to modify @5heikki's answer in the linked thread so it fits your use case. Try the following:

IFS=$'\n'; awk 'BEGIN{FS="\t"}{if($12=="Complete Genome"){print $20}}' assembly_summary.txt | awk 'BEGIN{OFS=FS="/"}{print "wget "$0,$NF"_genomic.fna.gz"}' | sh

No species.txt file is needed in your case.

There is no complete genome format. We are downloading only those genomes that have been marked as complete in the relevant column in assembly_summary.txt file.

ADD COMMENTlink modified 12 months ago • written 12 months ago by genomax71k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2273 users visited in the last hour