Question: Download targeted sequences with certain GI number, start position and end position
0
gravatar for horsedog
18 months ago by
horsedog30
horsedog30 wrote:

Hi, all, I need a lot of bacterial sequences from NCBI, and I have the GI number, start position and end position of each sequences I want. I'm wondering is it possible to only download the targeted sequences instead of the whole genome? I used the batch entrez before but it will give me the whole genome which I don't need. Thank you

sequence • 568 views
ADD COMMENTlink written 18 months ago by horsedog30
0
gravatar for genomax
18 months ago by
genomax65k
United States
genomax65k wrote:

NCBI eUtils would be the way to go. Can post post an example gi and region you need. BTW: NCBI stopped using gi's externally a while ago.

ADD COMMENTlink written 18 months ago by genomax65k

I'm sorry, could you please specify it a bit? Like how to introduce the start position and end position

ADD REPLYlink written 18 months ago by horsedog30

For example:

$ efetch -db nuccore -format fasta -id CP005986 -chr_start 1600000 -chr_stop 1600020 brings back a 20 bp chunk from this genome.

>CP005986.1:1600001-1600021 Acidithiobacillus caldus ATCC 51756, complete genome
ACGAGCGGCGCATTACTCCGA

BTW: CP005986 can be replaced by the gi number 640840007 to get the same result.

ADD REPLYlink modified 18 months ago • written 18 months ago by genomax65k

Oh! thank you very much, it's really amazing. But what if I have a batch of sequences want to extract, here I tried to save all the CP number, start position and end position in three different txt files, and I run: efetch -db nuccore -format fasta -id name.txt -chr_start start.txt -chr_stop end.txt, but it doesn't work.

ADD REPLYlink written 18 months ago by horsedog30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1036 users visited in the last hour