How to find a ncbi nucleotide entry in the FTP site
2
0
Entering edit mode
8.9 years ago
leontp587 • 0

Hello,

I have a big list of ncbi accession numbers for plasmids that I want to grab the sequences to. Examples:

gi|6090378|emb|A78782.1|
gi|154735|gb|K00546.1
gi|1208491|dbj|D45834.1

These are all plasmids, but none of them seem to show up under the ftp://ftp.ncbi.nih.gov/genomes/Plasmids/ folder. Anyone know why?

How can I find these records in the FTP site? I tried to use an FTP program to search for a file with A78782 in its name but this would take days given how big the FTP site is.

Sorry for this very newbie question. I'm completely new to this.

I know it's possible to use e-utils, but I have several tens of thousands of these accession numbers, so FTP would seem faster.

genome • 3.5k views
ADD COMMENT
4
Entering edit mode
8.9 years ago
Neilfws 49k

To answer "anyone know why" first, take a look at the README in FTP genomes which includes this line:

Plasmids: sequence and annotation of RefSeq plasmids

So only plasmids with RefSeq accessions are found there; you have GenBank accessions.

I'm pretty sure that you will not find a separate GenBank division for plasmid sequences in the FTP site. If you don't want to mirror GenBank locally or use eutils, you could try Batch Entrez, then Send to -> File -> Fasta.

ADD COMMENT
3
Entering edit mode
8.9 years ago
5heikki 11k

Your best bet is probably eutils. You can retrieve 500 seqs with every call so even several tens of thousands isn't such a big deal. Here $1 is a list of GIs and Entrez Direct is assumed to be in $PATH:

#!/bin/bash
split -l 500 -a 5 $1 input.
for f in input.*
do
IDs=$(cat $f | tr "\n" ",")
epost -db nuccore -id $IDs | efetch -format fasta > $f.output
done
cat *.output > plasmids.fna
#Everything worked out? rm input.* output.*
ADD COMMENT

Login before adding your answer.

Traffic: 2389 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6