All completed genomes for a phyla
1
0
Entering edit mode
8.2 years ago

Hello everyone,

is it possible to find all the completed genomes for certain phyla by obtaining their accession number?
I found some questions and answer here about it, but they all seem outdated or unclear. For example, I would like to access all the genomes for Acidobacteria from NCBI, which are finished and completed.

genome • 2.4k views
ADD COMMENT
2
Entering edit mode

I just used NCBI Taxonomy Database to see all sequence data for Acidobacteria (see: here). To access the list of all genomes for this phylum, click on the Assembly link.

ADD REPLY
0
Entering edit mode

Thats Assembly. Are they completed genomes? I am not sure thats what I was asking for.

ADD REPLY
5
Entering edit mode
8.2 years ago
Erik Wright ▴ 420

NCBI has a list of genomes by organism. Enter "Acidobacteria" then click "Search by organism". Next, select the prokaryotes tab. Sort by the "Level" column. Completed genomes have a filled black circle.

If the list is too long to click each genome individually, there is a link near the top that says "Download selected records". This produces a file with multiple columns, including the RefSeq ftp directory address. Simply write a script to download the file(s) that you want from each of those directories.

For example, the R script I use for downloading the FASTA files looks like this:

# read in the file downloaded from the NCBI genome browser
x <- read.csv("<<PATH TO genomes_proks.csv>>", stringsAsFactors=FALSE)
ftps <- x$GenBank.FTP

# select a subset of FTPs if desired
ftps <- ftps[which(x$Level=="Complete Genome")]

# set the input and output file locations
ftps <- paste(ftps,
    paste0(sapply(strsplit(ftps, "/", fixed=TRUE), tail, n=1),
        "_genomic.fna.gz"),
    sep="/")
saveto <- paste0("~/Downloads/",
    sapply(strsplit(ftps, "/", fixed=TRUE), tail, n=1))

# download each of the genomes to ~/Downloads/
pBar <- txtProgressBar(style=3)
for (i in seq_along(ftps)) {
    download.file(ftps[i], saveto[i])
    setTxtProgressBar(pBar, i/length(ftps))
}

Hope that helps!

ADD COMMENT
0
Entering edit mode

Thats it! Thanks!
But how did you learn about this? I am a bit confused about NCBI and its ins and outs.

ADD REPLY

Login before adding your answer.

Traffic: 2002 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6