Question: How to find transmembrane domain topological direction (extracellular/intracellular) from a uniprot text file?
4.2 years ago
Good Gravy20
United Kingdom
Good Gravy20 wrote:

I am trying to sort the fasta sequence of residues in transmembrane domains from left to right by extracellular to intracellular residues rather than N terminal to C terminal. Currently I am isolating the domains using a similar biopython script to A: How To Retrive A Batch Of Transmembrane Domains From Uniprot?

Originally I was writing a script that could detect extracellular location compared to Cytoplasmic location. However in the text files these appear to use the same ID (sample below; ECO:0000255).



FT   CHAIN         1    964       Ankyrin repeat and LEM domain-containing
FT                                protein 2.
FT                                /FTId=PRO_0000280243.
FT   TOPO_DOM      1      7       Extracellular. {ECO:0000255}.
FT   TRANSMEM      8     28       Helical; Signal-anchor for type III
FT                                membrane protein. {ECO:0000255}.
FT   TOPO_DOM     29    964       Cytoplasmic. {ECO:0000255}.
FT   DOMAIN       71    115       LEM. {ECO:0000255|PROSITE-
FT                                ProRule:PRU00313}.



Are there any softwares or some programatic way (preferably a module in biopython, perhaps there is a method in SeqIO that I have missed) that can pull the I/O direction from this type of uniprot text file? What might the annotation be in the text file?

modified 4.2 years ago by Peter5.8k • written 4.2 years ago by Good Gravy20

The text between {} are ECO (evidence code ontology codes) in this case a "match to sequence model evidence used in manual assertion"

written 4.2 years ago by me690
gravatar for Peter
4.2 years ago
Scotland, UK
Peter5.8k wrote:

I think you would need to look at both the ``TOPO_DOM`` features (where they label residues as extracellular or cytoplasmic) to infer which direction each ``TRANSMEM`` feature runs (into the cell, or out of the cell).

i.e. Extracellular ``TOPO_DOM``, then  ``TRANSMEM`, then cytoplasmic ``TOPO_DOM`` as in the quoted example means the transmembrane domain runs (N terminal to C terminal) from outside the cell to inside the cell.

As noted by user "@me" the identifiers in the curly brackets are evidence codes, not identifiers for each feature.

written 4.2 years ago by Peter5.8k
