Question: How to find a ncbi nucleotide entry in the FTP site
0
gravatar for leontp587
3.7 years ago by
leontp5870
United States
leontp5870 wrote:

Hello,

I have a big list of ncbi accession numbers for plasmids that I want to grab the sequences to. Examples:

gi|6090378|emb|A78782.1|
gi|154735|gb|K00546.1
gi|1208491|dbj|D45834.1

These are all plasmids, but none of them seem to show up under the ftp://ftp.ncbi.nih.gov/genomes/Plasmids/ folder. Anyone know why?

How can I find these records in the FTP site? I tried to use an FTP program to search for a file with A78782 in its name but this would take days given how big the FTP site is.

Sorry for this very newbie question. I'm completely new to this.

I know it's possible to use e-utils, but I have several tens of thousands of these accession numbers, so FTP would seem faster.

genome • 1.6k views
ADD COMMENTlink modified 3.7 years ago by 5heikki8.1k • written 3.7 years ago by leontp5870
4
gravatar for Neilfws
3.7 years ago by
Neilfws48k
Sydney, Australia
Neilfws48k wrote:

To answer "anyone know why" first, take a look at the README in FTP genomes which includes this line:

     Plasmids: sequence and annotation of RefSeq plasmids

So only plasmids with RefSeq accessions are found there; you have GenBank accessions.

I'm pretty sure that you will not find a separate GenBank division for plasmid sequences in the FTP site. If you don't want to mirror GenBank locally or use eutils, you could try Batch Entrez, then Send to -> File -> Fasta.

 

ADD COMMENTlink written 3.7 years ago by Neilfws48k
3
gravatar for 5heikki
3.7 years ago by
5heikki8.1k
Finland
5heikki8.1k wrote:

Your best bet is probably eutils. You can retrieve 500 seqs with every call so even several tens of thousands isn't such a big deal. Here $1 is a list of GIs and Entrez Direct is assumed to be in $PATH:

#!/bin/bash
split -l 500 -a 5 $1 input.
for f in input.*
do
IDs=$(cat $f | tr "\n" ",")
epost -db nuccore -id $IDs | efetch -format fasta > $f.output
done
cat *.output > plasmids.fna
#Everything worked out? rm input.* output.*
ADD COMMENTlink modified 3.7 years ago • written 3.7 years ago by 5heikki8.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1328 users visited in the last hour