How To Fetch Genomics Sequence Using Coordinates In Biopython
3
9
Entering edit mode
10.2 years ago
dustar1986 ▴ 350

Hi everyone,

I'm a newbie of biopython. My question may be stupid but I would appreciate your help.

I want to use chromosome number, start position, end position, strand to fetch the corresponding sequence in the mouse genome.

How can this be done with biopython connecting to NCBI database? Could anyone help me please?

Thanks a lot.

biopython sequence retrieval entrez database • 15k views
0
Entering edit mode

Thanks a lot for your editing and rephrasing, Eric.

21
Entering edit mode
10.2 years ago
Alex ★ 1.5k

It is a very simple, but you have to find sequence GI instead chromosome number. You can find GI in NCBI's Nucleotide DB.

For example, the mouse chromosome 6 has GI = 307603377, and you want to get a sequence of plus strand from 400100 to 400200:

from Bio import Entrez, SeqIO
Entrez.email = "A.N.Other@example.com"     # Always tell NCBI who you are
handle = Entrez.efetch(db="nucleotide",
id="307603377",
rettype="fasta",
strand=1,
seq_start=4000100,
seq_stop=4000200)
handle.close()
print record.seq


Parameters description from NCBI's efetch help:

strand - what strand of DNA to show (1 = plus or 2 = minus)
seq_start - show sequence starting from this base number
seq_stop - show sequence ending on this base number
complexity - gi is often a part of a biological blob, containing other gis

0
Entering edit mode

This is great.

I'm looking at some miRNA sequences for TFBS and was going to ask a similar question being a python newbie myself (although the Biopython cookbook was helping). Anyway, great timing!

0
Entering edit mode

0
Entering edit mode

Very helpful. I'm also working on promoter analysis of TFBS. thanks!

0
Entering edit mode

The Human chromosomes follow this pattern: "NC_000001", "NC_000002", ..., "NC_000023" (X), "NC_000024" (Y)

0
Entering edit mode

How can we get sequences for a certain genome build and group label? example: For homo sapiens, hg19, Grch37.p10 ? Thanks

2
Entering edit mode
10.2 years ago
Leszek 4.1k

Another homework?
Use combination of googling and reading, please. There you are biopython cook book.

4
Entering edit mode

@ Leszek- This should have been comment not an answer

3
Entering edit mode

No, it's not a homework. Thanks for your suggestion. I'm currently doing some research on 3' UTR region. I got the 3' UTR coordinates from USCS and need to know the sequence about them. I know this can be done use galaxy. As galaxy is written in python, just wonder if there is a module within biopython can do the same work or not.

1
Entering edit mode
10.2 years ago

I think you can also use EnsEMBL (and NCBI I believe) via the PyCogent toolkit to do this using Python.

Check out http://pycogent.sourceforge.net/ - the examples and cookbook contain some decent code that may be helpful :-)

0
Entering edit mode