Question: Where can I find genome of a single bacteria?
0
gravatar for matija.sosic
5.0 years ago by
matija.sosic80
Croatia/Zagreb/Faculty of Electronical Engineering and Computing
matija.sosic80 wrote:

Where can I find genome of a single bacteria, e.g. of E.coli? I downloaded rna-seq reads of E.coli from SRA and now I would like to align it using BWA to the genome of E.coli.

bacteria genome • 1.9k views
ADD COMMENTlink modified 2.2 years ago by Hajk-Georg Drost130 • written 5.0 years ago by matija.sosic80
2

Single bacterium; single bacterial species; single bacteria.

ADD REPLYlink written 5.0 years ago by Neilfws48k
4
gravatar for Devon Ryan
5.0 years ago by
Devon Ryan89k
Freiburg, Germany
Devon Ryan89k wrote:

Ensembl would be my first bet. Odds are good you mean this one in particular, though there are many other substrains that have been sequenced.

ADD COMMENTlink modified 5.0 years ago • written 5.0 years ago by Devon Ryan89k

I have one fasta file from that strain, but also this one: http://www.ncbi.nlm.nih.gov/sra/?term=SRR1187101

Is there some way to know if this strain's genome is sequenced, besides checking manually al sources? Would it be ok if I just aligned to tthe strain you proposed?

Thanks!

ADD REPLYlink modified 5.0 years ago • written 5.0 years ago by matija.sosic80
1

(N.B., I don't work on E. Coli so take this with an appropriately sized grain of salt!) Yeah, I'd go ahead and align it to the aforementioned reference. Perhaps then take the sequence of a gene that has a lot of differences vs. the reference and then blast that to see if perhaps there's a closer strain if you really want. At the end of the day, it really depends on what your goals are. The original study that you just linked to was looking at strain sequence association to a clinical phenotype, so in many ways the exact reference strain used may not have been that important.

BTW, you might also consider de novo or reference based assembly.

ADD REPLYlink written 5.0 years ago by Devon Ryan89k
2
gravatar for Neilfws
5.0 years ago by
Neilfws48k
Sydney, Australia
Neilfws48k wrote:

Lots of places. EBI Bacterial Genomes; NCBI; Genomes Online Database; Sanger bacterial genomes; IMG; Ensembl Bacteria...

All easily found via a web search for "bacterial genomes database".

ADD COMMENTlink modified 5.0 years ago • written 5.0 years ago by Neilfws48k
1
gravatar for Martin A Hansen
5.0 years ago by
Martin A Hansen3.0k
Denmark
Martin A Hansen3.0k wrote:

From memory U00096 :o)

ADD COMMENTlink written 5.0 years ago by Martin A Hansen3.0k
1
gravatar for Hajk-Georg Drost
2.2 years ago by
Cambridge
Hajk-Georg Drost130 wrote:

I know that this question is already almost 3 years old, but I hope that my answer might be useful to others anyway.

I implemented a standardized way to automate the genome retrieval process in R (see biomartr package).

To retrieve a bacterial reference genome from several database sources using only the scientific name of the bacteria of interest one can simply type:

# download Escherichia coli reference genome from NCBI RefSeq
biomartr::getGenome(db  = "refseq", organism = "Escherichia coli")

or

# download Escherichia coli reference genome from NCBI Genbank
 biomartr::getGenome(db  = "genbank", organism = "Escherichia coli")

In case you wish to download all available bacterial genomes at once, simply type:

# download all bacterial reference genomes from NCBI RefSeq
biomartr::meta.retrieval(kingdom = "bacteria", db = "refseq", type = "genome")

For more details about downloading specific genomes from specific kingdoms or subkingdoms of life please consult the Genomic Sequence Retrieval vignette of the biomartr package. For metagenome downloads, please consult the Meta-Genome Retrieval vignette and for entire database retrieval the Database Retrieval vignette.

Please note that to promote computational reproducibility in genomics and metagenomics studies, biomartr stores log files for each downloaded genome, proteome, or CDS file.

An example log file looks as follows:

File Name: Escherichia_coli_genomic_refseq.fna.gz

Organism Name: Escherichia_coli

Database: NCBI refseq

URL: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/005/845/GCF_000005845.2_ASM584v2/GCF_000005845.2_ASM584v2_genomic.fna.gz

Download_Date: Wed Feb 15 15:17:50 2017

refseq_category: reference genome

assembly_accession: GCF_000005845.2

bioproject: PRJNA57779

biosample: SAMN02604091

taxid: 511145

infraspecific_name: strain=K-12 substr. MG1655

version_status: latest

release_type: Major

genome_rep: Full

seq_rel_date: 2013-09-26

submitter: Univ. Wisconsin

ADD COMMENTlink written 2.2 years ago by Hajk-Georg Drost130
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 820 users visited in the last hour