Question: Getting DNA sequence from a genome knowing the start and end position
0
gravatar for PaSua
6 months ago by
PaSua0
PaSua0 wrote:

Python newby here.

I was wondering if there is a way of getting the sequence of a genome from NCBI giving a point of start and end. For instance, I'm working with this genome ID (NC_011375.1) and I would like to obtain the sequence that is between 259882 and 259896 bases. So far, I have this:

 from Bio import Entrez 
 from Bio import SeqIO

Entrez.email = "my@email.org"

handle = Entrez.efetch(db="nuccore",
                   id="NC_011375.1",
                   rettype="gb",
                   retmode="text")

whole_sequence = SeqIO.read(handle, "genbank")

print whole_sequence[259882:259896]

And this is the output I get:

ID: NC_011375.1
Name: NC_011375
Description: Streptococcus pyogenes NZ131, complete genome.
Number of features: 0
UnknownSeq(14, alphabet = IUPACAmbiguousDNA(), character = 'N')

As you can see, it´s not working. Since I don´t know how to proceed, any help would be appreciated.

Thank you in advance.

ADD COMMENTlink modified 6 months ago by Pierre Lindenbaum122k • written 6 months ago by PaSua0
1

I don't know the syntax for this command, but keep in mind that Python uses 0-based indexing, so the first base is actually in position 0 not 1- you must adjust accordingly.

ADD REPLYlink modified 6 months ago • written 6 months ago by jean.elbers1.3k

Solved. I wasn´t using the correct ID (it needs to be a CP reference, not a NC_). Anyway, thank you because I needed to adjust the position accordingly to Python indexing, as you said.

I put the solution here hopping someone will find it useful:

from Bio import Entrez 
from Bio import SeqIO

Entrez.email = "my@email.org"

handle = Entrez.efetch(db="nuccore",
                   id="CP000829",
                   rettype="gb",
                   retmode="text")

whole_sequence = SeqIO.read(handle, "genbank")

print whole_sequence[259881:259896]

output:

ID: CP000829.1
Name: CP000829
Description: Streptococcus pyogenes NZ131, complete genome.
Number of features: 0
Seq('AATATTCAGATAATT', IUPACAmbiguousDNA())
ADD REPLYlink modified 6 months ago • written 6 months ago by PaSua0
4
gravatar for Pierre Lindenbaum
6 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum122k wrote:
$ wget -q -O -  "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=NC_011375.1&rettype=fasta&seq_start=259882&seq_stop=259896" 

>NC_011375.1:259882-259896 Streptococcus pyogenes NZ131, complete genome
AATATTCAGATAATT
ADD COMMENTlink written 6 months ago by Pierre Lindenbaum122k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1621 users visited in the last hour