Getting DNA sequence from a genome knowing the start and end position
1
0
Entering edit mode
5.1 years ago
PaSua • 0

Python newby here.

I was wondering if there is a way of getting the sequence of a genome from NCBI giving a point of start and end. For instance, I'm working with this genome ID (NC_011375.1) and I would like to obtain the sequence that is between 259882 and 259896 bases. So far, I have this:

 from Bio import Entrez 
 from Bio import SeqIO

Entrez.email = "my@email.org"

handle = Entrez.efetch(db="nuccore",
                   id="NC_011375.1",
                   rettype="gb",
                   retmode="text")

whole_sequence = SeqIO.read(handle, "genbank")

print whole_sequence[259882:259896]

And this is the output I get:

ID: NC_011375.1
Name: NC_011375
Description: Streptococcus pyogenes NZ131, complete genome.
Number of features: 0
UnknownSeq(14, alphabet = IUPACAmbiguousDNA(), character = 'N')

As you can see, it´s not working. Since I don´t know how to proceed, any help would be appreciated.

Thank you in advance.

start end sequence genbank position • 1.0k views
ADD COMMENT
1
Entering edit mode

I don't know the syntax for this command, but keep in mind that Python uses 0-based indexing, so the first base is actually in position 0 not 1- you must adjust accordingly.

ADD REPLY
0
Entering edit mode

Solved. I wasn´t using the correct ID (it needs to be a CP reference, not a NC_). Anyway, thank you because I needed to adjust the position accordingly to Python indexing, as you said.

I put the solution here hopping someone will find it useful:

from Bio import Entrez 
from Bio import SeqIO

Entrez.email = "my@email.org"

handle = Entrez.efetch(db="nuccore",
                   id="CP000829",
                   rettype="gb",
                   retmode="text")

whole_sequence = SeqIO.read(handle, "genbank")

print whole_sequence[259881:259896]

output:

ID: CP000829.1
Name: CP000829
Description: Streptococcus pyogenes NZ131, complete genome.
Number of features: 0
Seq('AATATTCAGATAATT', IUPACAmbiguousDNA())
ADD REPLY
4
Entering edit mode
5.1 years ago
$ wget -q -O -  "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=NC_011375.1&rettype=fasta&seq_start=259882&seq_stop=259896" 

>NC_011375.1:259882-259896 Streptococcus pyogenes NZ131, complete genome
AATATTCAGATAATT
ADD COMMENT

Login before adding your answer.

Traffic: 2133 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6