Download targeted sequences with certain GI number, start position and end position
1
0
Entering edit mode
6.6 years ago
horsedog ▴ 60

Hi, all, I need a lot of bacterial sequences from NCBI, and I have the GI number, start position and end position of each sequences I want. I'm wondering is it possible to only download the targeted sequences instead of the whole genome? I used the batch entrez before but it will give me the whole genome which I don't need. Thank you

sequence • 1.5k views
ADD COMMENT
0
Entering edit mode
6.6 years ago
GenoMax 142k

NCBI eUtils would be the way to go. Can post post an example gi and region you need. BTW: NCBI stopped using gi's externally a while ago.

ADD COMMENT
0
Entering edit mode

I'm sorry, could you please specify it a bit? Like how to introduce the start position and end position

ADD REPLY
0
Entering edit mode

For example:

$ efetch -db nuccore -format fasta -id CP005986 -chr_start 1600000 -chr_stop 1600020 brings back a 20 bp chunk from this genome.

>CP005986.1:1600001-1600021 Acidithiobacillus caldus ATCC 51756, complete genome
ACGAGCGGCGCATTACTCCGA

BTW: CP005986 can be replaced by the gi number 640840007 to get the same result.

ADD REPLY
0
Entering edit mode

Oh! thank you very much, it's really amazing. But what if I have a batch of sequences want to extract, here I tried to save all the CP number, start position and end position in three different txt files, and I run: efetch -db nuccore -format fasta -id name.txt -chr_start start.txt -chr_stop end.txt, but it doesn't work.

ADD REPLY

Login before adding your answer.

Traffic: 1480 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6