Question: Identify genes with extracellular matrix domains
gravatar for komal.rathi
4.6 years ago by
Children's Hospital of Philadelphia, Philadelphia, PA
komal.rathi3.7k wrote:


My goal is to identify the list of transmembrane genes that have atleast one domain sticking out in the extracellular matrix. My approach was to utilize the COMPARTMENTS database for it. I downloaded the knowledgebase from COMPARTMENTS. It has the following format:

ensembl_peptide_id hgnc_symbol         GO              GO_Type    Source Evidence_Code
   ENSP00000000233        ARF5 GO:0005576 Extracellular region UniProtKB           IDA
   ENSP00000000442       ESRRA GO:0005576 Extracellular region       HPA           IDA
   ENSP00000001008       FKBP4 GO:0005576 Extracellular region UniProtKB           IDA
   ENSP00000002125     NDUFAF7 GO:0005576 Extracellular region UniProtKB           IDA
   ENSP00000002165       FUCA2 GO:0005576 Extracellular region UniProtKB           IDA
   ENSP00000002829      SEMA3F GO:0005576 Extracellular region   ProtInc           TAS

My approach is a pretty simple one - filter the list using the GO_Type being Plasma membrane, Cell surface, Extracellular region or extracellular matrix (these are just a few out of many possibilities). Then, filter by score>=3 or if I am being stringent then a score>=4. A score greater than 4 means it is curated, lesser the score lesser the confidence value. However, this approach seems too simplistic to me. I was also thinking of parsing the list of genes thus obtained to a domain finder. I tried the web API of SMART and it doesn't give a very data-mining friendly output.

Is there a better tool/approach that can help identify genes with domains in extracellular matrix with some confidence value?

Any thoughts would be much appreciated.

ADD COMMENTlink modified 4.6 years ago by Khader Shameer18k • written 4.6 years ago by komal.rathi3.7k
gravatar for Michael Dondrup
4.6 years ago by
Bergen, Norway
Michael Dondrup48k wrote:

There are several web-servers for membrane topology prediction like

These are meta tools that run several prediction tools and generate a consensus.

TOPDB is a database of transmembrane proteins and their topologies.

ADD COMMENTlink modified 4.6 years ago • written 4.6 years ago by Michael Dondrup48k

Thanks for the suggestions, I looked over and running TOPCONS and CCTOP. It is probably going to take a long time because I have about 15196 genes :|

ADD REPLYlink written 4.6 years ago by komal.rathi3.7k

Topcons and CCTOP can be installed locally, see Download, also Topcons has a batch api: InterProScan also runs a membrane topology predictor, you can check for such annotation. Further you could pre filter your proteins by having at least one or two TMHMM predicted transmembrane domains these annotations should be already available in Ensembl Biomart, Query here. There are about 6000 genes with transmembrane domains, so 15000 seems a bit high.

ADD REPLYlink modified 4.6 years ago • written 4.6 years ago by Michael Dondrup48k

My main goal was to identify if they have any domain that is in the Extracellular matrix. TOPCONS gives me this kind of result, however I don't know how to make sense out of it except that o means outer membrane.


TOPCONS predicted topology: oooooooooooooooooooooooooooMMMMMMMMMMMMMMMMMMMMMiiiiiiiiiiiiiiiiiMMMMMMMMMMMMMMMMMMMMMoooooooooooMMMMMMMMMMMMMMMMMMMMMiiiiiiiiiiiiMMMMMMMMMMMMMMMMMMMMMoooooooooooooooooooooooooMMMMMMMMMMMMMMMMMMMMMiiiiiiiiiiiiiMMMMMMMMMMMMMMMMMMMMMooooooooooooMMMMMMMMMMMMMMMMMMMMMiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii

OCTOPUS predicted topology: oooooooooooooooooooooooooooMMMMMMMMMMMMMMMMMMMMMiiiiiiiiiiiiiiiiiMMMMMMMMMMMMMMMMMMMMMoooooooooooMMMMMMMMMMMMMMMMMMMMMiiiiiiiiiiiiMMMMMMMMMMMMMMMMMMMMMoooooooooooooooooooooooooMMMMMMMMMMMMMMMMMMMMMiiiiiiiiiiiiiMMMMMMMMMMMMMMMMMMMMMooooooooooooooMMMMMMMMMMMMMMMMMMMMMiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii

Philius predicted topology: oooooooooooooooooooooooooooooooooMMMMMMMMMMMMMMMMMMMMMiiiiiiiiiiiiiiMMMMMMMMMMMMMMMMMMMMMooooooooMMMMMMMMMMMMMMMMMMMMMMiiiiiiiiiMMMMMMMMMMMMMMMMMMMMMMooooooooooooooooooooooooooooMMMMMMMMMMMMMMMMMMMMMMMiiiiiiiiiMMMMMMMMMMMMMMMMMMMMMMMooooooooooMMMMMMMMMMMMMMMMMMMMMMMMiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii

PolyPhobius predicted topology: ooooooooooooooooooooooooooMMMMMMMMMMMMMMMMMMMMMMMMMMMiiiiiiiiiiiMMMMMMMMMMMMMMMMMMMMMMMoooooooooooMMMMMMMMMMMMMMMMMMMMMiiiiiiiiiiiMMMMMMMMMMMMMMMMMMMMMMMMMoooooooooooooooooooooMMMMMMMMMMMMMMMMMMMMMMMiiiiiiiiiiiMMMMMMMMMMMMMMMMMMMMMMoooooooooooMMMMMMMMMMMMMMMMMMMiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii

SCAMPI predicted topology: ooooooooooooooooooooooooooooooMMMMMMMMMMMMMMMMMMMMMiiiiiiiiiiiiiiiMMMMMMMMMMMMMMMMMMMMMooooooMMMMMMMMMMMMMMMMMMMMMiiiiiiiiiiiiiiiiiiiMMMMMMMMMMMMMMMMMMMMMooooooooooooooooooooMMMMMMMMMMMMMMMMMMMMMiiiiiiiiiiiiiiiiiMMMMMMMMMMMMMMMMMMMMMoooooooooMMMMMMMMMMMMMMMMMMMMMiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii

SPOCTOPUS predicted topology: oooooooooooooooooooooooooooMMMMMMMMMMMMMMMMMMMMMiiiiiiiiiiiiiiiiiMMMMMMMMMMMMMMMMMMMMMoooooooooooMMMMMMMMMMMMMMMMMMMMMiiiiiiiiiiiiMMMMMMMMMMMMMMMMMMMMMoooooooooooooooooooooooooMMMMMMMMMMMMMMMMMMMMMiiiiiiiiiiiiiMMMMMMMMMMMMMMMMMMMMMooooooooooooooMMMMMMMMMMMMMMMMMMMMMiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii

And CCTOP just tells me if it TM or not, which I already know, given that I found the list of TM proteins from Compartments, I can find a similar list of genes using biomaRt as well. I might only go ahead with biomaRt to get genes matching GO terms like extracellular matrix and transmembrane. Thanks for the help though!

ADD REPLYlink written 4.6 years ago by komal.rathi3.7k

o := outside i := inside M := membrane

CCTOP gives you a similar prediction in the image it makes. These annotations are important to estimate the size and orientation of domains.

ADD REPLYlink written 4.6 years ago by Michael Dondrup48k

Yes I understand that but my problem is interpretation of the output. Firstly, the output is not very easy to parse. Secondly, as far as I could check, all the query proteins have some i, o and m regions. I just wanted to find if there are TMM proteins that have (or don't have) an extracellular domain. Also I believe this tool is mainly for predicting if there is a TMM domain in your query sequence (which not necessarily means extracellular). The results can be accessed here -

ADD REPLYlink written 4.6 years ago by komal.rathi3.7k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1505 users visited in the last hour