Downloading plasmid sequences using refseq ids
3
0
Entering edit mode
9.8 years ago
bioinfo ▴ 830

Is there any easy way to download Plasmids from NCBI plasmids site using a list of RefSeq ids in a file?

RefSeq ids looks like as below in a txt file:

NC_017775
NC_017810
NC_017776
NC_017777
.........
NC_017811
NC_017778
NC_017779
perl sequence fasta NCBI plasmids • 5.0k views
ADD COMMENT
1
Entering edit mode
9.6 years ago
elucify ▴ 10

Easier solution

curl -s 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=NC_017775,NC_017810&rettype=fasta'

To get more sequences, just add refseq accession numbers, comma-separated, to the list. This will work until the URL is up to about 2k in length. If you have more sequences than that, you can break your list of ids into comma-separated blocks of just under 2k, and iterate over the blocks.

ADD COMMENT
0
Entering edit mode
9.8 years ago

search this site for Efetch:

 echo -e "NC_017775\nNC_017810" | while read ACN; do curl -s  "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=${ACN}&retmode=text&rettype=fasta" ; done
ADD COMMENT
0
Entering edit mode

I was thinking if I just download all the sequences from the ftp site first then extract the required fasta sequences using the RefSeq id list in the txt file then that should work as well. I just tried this but got stuck.

cut -c 1- test.id.txt | xargs -n 1 samtools faidx large.dowloaded.plasmid.fasta (it requires the entire Fasta header as ID in the test.id.txt file.

the entire header line e.g. gi|386858858|ref|NC_017775.1| Borrelia crocidurae str. Achema plasmid unnamed, complete sequence))

So, i need to match the RefSeq IDs (e.g NC_017775.1) from the fasta header instead of matching the entire header. any suggestions?

ADD REPLY
0
Entering edit mode

to use tabix, file should be indexed with tabix on the NCBI side: They're not. Search biostars to 'grep' on fasta file on its name.

ADD REPLY
0
Entering edit mode
9.6 years ago
A few weeks later, to complete what @elucify said. You can group your input with xargs:
seq 25 100 |\
xargs -n5 -r echo | tr " " "," |\
while read F; do curl "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&rettype=fasta&id=${F}" ; done
ADD COMMENT

Login before adding your answer.

Traffic: 1942 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6