Where can I find genome of a single bacteria, e.g. of E.coli? I downloaded rna-seq reads of E.coli from SRA and now I would like to align it using BWA to the genome of E.coli.
I know that this question is already almost 3 years old, but I hope that my answer might be useful to others anyway.
I implemented a standardized way to automate the genome retrieval process in R (see biomartr package).
To retrieve a bacterial reference genome from several database sources using only the scientific name of the bacteria of interest one can simply type:
# download Escherichia coli reference genome from NCBI RefSeq biomartr::getGenome(db = "refseq", organism = "Escherichia coli")
# download Escherichia coli reference genome from NCBI Genbank biomartr::getGenome(db = "genbank", organism = "Escherichia coli")
In case you wish to download all available bacterial genomes at once, simply type:
# download all bacterial reference genomes from NCBI RefSeq biomartr::meta.retrieval(kingdom = "bacteria", db = "refseq", type = "genome")
For more details about downloading specific genomes from specific kingdoms or subkingdoms of life please consult the Genomic Sequence Retrieval vignette of the biomartr package. For metagenome downloads, please consult the Meta-Genome Retrieval vignette and for entire database retrieval the Database Retrieval vignette.
Please note that to promote computational reproducibility in genomics and metagenomics studies, biomartr stores log files for each downloaded genome, proteome, or CDS file.
An example log file looks as follows:
File Name: Escherichia_coli_genomic_refseq.fna.gz
Organism Name: Escherichia_coli
Database: NCBI refseq
Download_Date: Wed Feb 15 15:17:50 2017
refseq_category: reference genome
infraspecific_name: strain=K-12 substr. MG1655
submitter: Univ. Wisconsin