Get Gene Coding Sequence Using Gene Name/Id In Biopython
1
4
Entering edit mode
11.7 years ago
Ash ▴ 40

Maybe this is something really obvious, but what's the best way to get the coding sequence of a gene (main/reference isoform, if that makes a difference) with biopython when you have just the gene name or gene ID.

You can, obviously, get the coding region's locations, parse that information, and pull the coding sequence from the genome, but there's got to be a better way? Is the full coding sequence not stored somewhere, or accessible through a single call rather than building from scratch based on positional information?

biopython ncbi • 11k views
ADD COMMENT
5
Entering edit mode
11.7 years ago
Peter 6.0k

If you have the gene name or gene ID as used by the NCBI, you could use Bio.Entrez to connect to the NCBI Entrez web API and download the sequence (see the EFetch call).

If you have the gene name or gene ID and a matching GenBank/EMBL format file (e.g. for the genome or chromosome), you should be able to parse that (with Bio.SeqIO), find the feature of interest (a SeqFeature object), and use the feature object's extract method to pull of the sequence (taking care of the co-ordinates and strand for you).

For both those operations, I refer you to the Tutorial - http://biopython.org/DIST/docs/tutorial/Tutorial.html

If neither of those apply, then what kind of gene name/ID do you have?

ADD COMMENT

Login before adding your answer.

Traffic: 1705 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6