Question: Getting DNA sequence from a genome knowing the start and end position
0
gravatar for PaSua
12 months ago by
PaSua0
PaSua0 wrote:

Python newby here.

I was wondering if there is a way of getting the sequence of a genome from NCBI giving a point of start and end. For instance, I'm working with this genome ID (NC_011375.1) and I would like to obtain the sequence that is between 259882 and 259896 bases. So far, I have this:

 from Bio import Entrez 
 from Bio import SeqIO

Entrez.email = "my@email.org"

handle = Entrez.efetch(db="nuccore",
                   id="NC_011375.1",
                   rettype="gb",
                   retmode="text")

whole_sequence = SeqIO.read(handle, "genbank")

print whole_sequence[259882:259896]

And this is the output I get:

ID: NC_011375.1
Name: NC_011375
Description: Streptococcus pyogenes NZ131, complete genome.
Number of features: 0
UnknownSeq(14, alphabet = IUPACAmbiguousDNA(), character = 'N')

As you can see, it´s not working. Since I don´t know how to proceed, any help would be appreciated.

Thank you in advance.

ADD COMMENTlink modified 12 months ago by Pierre Lindenbaum127k • written 12 months ago by PaSua0
1

I don't know the syntax for this command, but keep in mind that Python uses 0-based indexing, so the first base is actually in position 0 not 1- you must adjust accordingly.

ADD REPLYlink modified 12 months ago • written 12 months ago by jean.elbers1.3k

Solved. I wasn´t using the correct ID (it needs to be a CP reference, not a NC_). Anyway, thank you because I needed to adjust the position accordingly to Python indexing, as you said.

I put the solution here hopping someone will find it useful:

from Bio import Entrez 
from Bio import SeqIO

Entrez.email = "my@email.org"

handle = Entrez.efetch(db="nuccore",
                   id="CP000829",
                   rettype="gb",
                   retmode="text")

whole_sequence = SeqIO.read(handle, "genbank")

print whole_sequence[259881:259896]

output:

ID: CP000829.1
Name: CP000829
Description: Streptococcus pyogenes NZ131, complete genome.
Number of features: 0
Seq('AATATTCAGATAATT', IUPACAmbiguousDNA())
ADD REPLYlink modified 12 months ago • written 12 months ago by PaSua0
4
gravatar for Pierre Lindenbaum
12 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum127k wrote:
$ wget -q -O -  "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=NC_011375.1&rettype=fasta&seq_start=259882&seq_stop=259896" 

>NC_011375.1:259882-259896 Streptococcus pyogenes NZ131, complete genome
AATATTCAGATAATT
ADD COMMENTlink written 12 months ago by Pierre Lindenbaum127k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 774 users visited in the last hour