Question: How to find transmembrane domain topological direction (extracellular/intracellular) from a uniprot text file?
gravatar for Good Gravy
5.4 years ago by
Good Gravy20
United Kingdom
Good Gravy20 wrote:

I am trying to sort the fasta sequence of residues in transmembrane domains from left to right by extracellular to intracellular residues rather than N terminal to C terminal. Currently I am isolating the domains using a similar biopython script to A: How To Retrive A Batch Of Transmembrane Domains From Uniprot?

Originally I was writing a script that could detect extracellular location compared to Cytoplasmic location. However in the text files these appear to use the same ID (sample below; ECO:0000255).



FT   CHAIN         1    964       Ankyrin repeat and LEM domain-containing
FT                                protein 2.
FT                                /FTId=PRO_0000280243.
FT   TOPO_DOM      1      7       Extracellular. {ECO:0000255}.
FT   TRANSMEM      8     28       Helical; Signal-anchor for type III
FT                                membrane protein. {ECO:0000255}.
FT   TOPO_DOM     29    964       Cytoplasmic. {ECO:0000255}.
FT   DOMAIN       71    115       LEM. {ECO:0000255|PROSITE-
FT                                ProRule:PRU00313}.



Are there any softwares or some programatic way (preferably a module in biopython, perhaps there is a method in SeqIO that I have missed) that can pull the I/O direction from this type of uniprot text file? What might the annotation be in the text file?

uniprot biopython sequence • 2.5k views
ADD COMMENTlink modified 5.4 years ago by Peter5.8k • written 5.4 years ago by Good Gravy20

The text between {} are ECO (evidence code ontology codes) in this case a "match to sequence model evidence used in manual assertion"

ADD REPLYlink written 5.4 years ago by me690
gravatar for Peter
5.4 years ago by
Scotland, UK
Peter5.8k wrote:

I think you would need to look at both the ``TOPO_DOM`` features (where they label residues as extracellular or cytoplasmic) to infer which direction each ``TRANSMEM`` feature runs (into the cell, or out of the cell).

i.e. Extracellular ``TOPO_DOM``, then  ``TRANSMEM`, then cytoplasmic ``TOPO_DOM`` as in the quoted example means the transmembrane domain runs (N terminal to C terminal) from outside the cell to inside the cell.

As noted by user "@me" the identifiers in the curly brackets are evidence codes, not identifiers for each feature.

ADD COMMENTlink written 5.4 years ago by Peter5.8k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 761 users visited in the last hour