Python Framework For Converting Genomic To Protein Coordinates
Entering edit mode
9.1 years ago
user ▴ 870

Is there a framework for converting between genomic coordinates and protein coordinates, given a transcript (i.e. a list of exon coordinates)? To go from CDS coordinates to amino-acid coordinates.

The trick is to do this correctly even when the transcript is on the minus strand, which would mean the highest coordinate (not lowest) indicates the start amino acid.

I saw the BioPython related frameworks (like this and this but I'd prefer not to rely on all of BioPython just for the coordinate transform. I also am not sure how BioPython handles the strandedness.

Apparently PyGr can do this but with ORF containing transcripts but I've never seen an example and cannot see how it can be done from the documentation.

Any pointers to frameworks that do this correctly or examples would be helpful.

biopython protein coordinates python • 4.3k views
Entering edit mode
9.1 years ago

for the algorithm protein->genomic it using the UCSC knownGene database. see this java code:

for the algorithm genomic->protein: see this previous post: How to calculate the protein change and codon position within a nucleotide sequence of a single nucleotide substitution?


Login before adding your answer.

Traffic: 2430 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6