Question: All completed genomes for a phyla
gravatar for frenchmytoast112
4.1 years ago by
frenchmytoast11210 wrote:

Hello everyone,

is it possible to find all the completed genomes for certain phyla by obtaining their accession number?
I found some questions and answer here about it, but they all seem outdated or unclear. For example, I would like to access all the genomes for Acidobacteria from NCBI, which are finished and completed.

genome • 1.3k views
ADD COMMENTlink modified 4.1 years ago by Erik Wright360 • written 4.1 years ago by frenchmytoast11210

I just used NCBI Taxonomy Database to see all sequence data for Acidobacteria (see: here). To access the list of all genomes for this phylum, click on the Assembly link.

ADD REPLYlink modified 8 weeks ago by RamRS25k • written 4.1 years ago by a.zielezinski9.0k

Thats Assembly. Are they completed genomes? I am not sure thats what I was asking for.

ADD REPLYlink written 4.1 years ago by frenchmytoast11210
gravatar for Erik Wright
4.1 years ago by
Erik Wright360
Erik Wright360 wrote:

NCBI has a list of genomes by organism. Enter "Acidobacteria" then click "Search by organism". Next, select the prokaryotes tab. Sort by the "Level" column. Completed genomes have a filled black circle.

If the list is too long to click each genome individually, there is a link near the top that says "Download selected records". This produces a file with multiple columns, including the RefSeq ftp directory address. Simply write a script to download the file(s) that you want from each of those directories.

For example, the R script I use for downloading the FASTA files looks like this:

# read in the file downloaded from the NCBI genome browser
x <- read.csv("<<PATH TO genomes_proks.csv>>", stringsAsFactors=FALSE)
ftps <- x$GenBank.FTP

# select a subset of FTPs if desired
ftps <- ftps[which(x$Level=="Complete Genome")]

# set the input and output file locations
ftps <- paste(ftps,
    paste0(sapply(strsplit(ftps, "/", fixed=TRUE), tail, n=1),
saveto <- paste0("~/Downloads/",
    sapply(strsplit(ftps, "/", fixed=TRUE), tail, n=1))

# download each of the genomes to ~/Downloads/
pBar <- txtProgressBar(style=3)
for (i in seq_along(ftps)) {
    download.file(ftps[i], saveto[i])
    setTxtProgressBar(pBar, i/length(ftps))

Hope that helps!

ADD COMMENTlink modified 8 weeks ago by RamRS25k • written 4.1 years ago by Erik Wright360

Thats it! Thanks!
But how did you learn about this? I am a bit confused about NCBI and its ins and outs.

ADD REPLYlink written 4.1 years ago by frenchmytoast11210
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2318 users visited in the last hour