Identify genes with extracellular matrix domains
1
0
Entering edit mode
5.9 years ago
komal.rathi ★ 4.0k

Hi,

My goal is to identify the list of transmembrane genes that have atleast one domain sticking out in the extracellular matrix. My approach was to utilize the COMPARTMENTS database for it. I downloaded the knowledgebase from COMPARTMENTS. It has the following format:

ensembl_peptide_id hgnc_symbol         GO              GO_Type    Source Evidence_Code
ENSP00000000233        ARF5 GO:0005576 Extracellular region UniProtKB           IDA
ENSP00000000442       ESRRA GO:0005576 Extracellular region       HPA           IDA
ENSP00000001008       FKBP4 GO:0005576 Extracellular region UniProtKB           IDA
ENSP00000002125     NDUFAF7 GO:0005576 Extracellular region UniProtKB           IDA
ENSP00000002165       FUCA2 GO:0005576 Extracellular region UniProtKB           IDA
ENSP00000002829      SEMA3F GO:0005576 Extracellular region   ProtInc           TAS
Score
5
4
5
5
5


My approach is a pretty simple one - filter the list using the GO_Type being Plasma membrane, Cell surface, Extracellular region or extracellular matrix (these are just a few out of many possibilities). Then, filter by score>=3 or if I am being stringent then a score>=4. A score greater than 4 means it is curated, lesser the score lesser the confidence value. However, this approach seems too simplistic to me. I was also thinking of parsing the list of genes thus obtained to a domain finder. I tried the web API of SMART and it doesn't give a very data-mining friendly output.

Is there a better tool/approach that can help identify genes with domains in extracellular matrix with some confidence value?

Any thoughts would be much appreciated.

protein domain compartments extracellular matrix • 1.7k views
0
Entering edit mode
5.9 years ago

There are several web-servers for membrane topology prediction like

These are meta tools that run several prediction tools and generate a consensus.

TOPDB is a database of transmembrane proteins and their topologies.

0
Entering edit mode

Thanks for the suggestions, I looked over and running TOPCONS and CCTOP. It is probably going to take a long time because I have about 15196 genes :|

0
Entering edit mode

Topcons and CCTOP can be installed locally, see Download, also Topcons has a batch api: http://topcons.cbr.su.se/pred/help-wsdl-api/. InterProScan also runs a membrane topology predictor, you can check for such annotation. Further you could pre filter your proteins by having at least one or two TMHMM predicted transmembrane domains these annotations should be already available in Ensembl Biomart, Query here. There are about 6000 genes with transmembrane domains, so 15000 seems a bit high.

0
Entering edit mode

My main goal was to identify if they have any domain that is in the Extracellular matrix. TOPCONS gives me this kind of result, however I don't know how to make sense out of it except that o means outer membrane.

Sequence number: 1 Sequence name: GPRC5A|ENSP00000014914 Sequence length: 358 aa. Sequence: MATTVPDGCRNGLKSKYYRLCDKAEAWGIVLETVATAGVVTSVAFMLTLPILVCKVQDSNRRKMLPTQFLFLLGVLGIFGLTFAFIIGLDGSTGPTRFFLFGILFSICFSCLLAHAVSLTKLVRGRKPLSLLVILGLAVGFSLVQDVIAIEYIVLTMNRTNVNVFSELSAPRRNEDFVLLLTYVLFLMALTFLMSSFTFCGSFTGWKRHGAHIYLTMLLSIAIWVAWITLLMLPDFDRRWDDTILSSALAANGWVFLLAYVSPEFWLLTKQRNPMDYPVEDAFCKPQLVKKSYGVENRAYSQEEITQGFEETGDTLYAPYSTHFQLQNQPPQKEFSIPRAHAWPSPYKDYEVKKEGSX

TOPCONS predicted topology: oooooooooooooooooooooooooooMMMMMMMMMMMMMMMMMMMMMiiiiiiiiiiiiiiiiiMMMMMMMMMMMMMMMMMMMMMoooooooooooMMMMMMMMMMMMMMMMMMMMMiiiiiiiiiiiiMMMMMMMMMMMMMMMMMMMMMoooooooooooooooooooooooooMMMMMMMMMMMMMMMMMMMMMiiiiiiiiiiiiiMMMMMMMMMMMMMMMMMMMMMooooooooooooMMMMMMMMMMMMMMMMMMMMMiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii

OCTOPUS predicted topology: oooooooooooooooooooooooooooMMMMMMMMMMMMMMMMMMMMMiiiiiiiiiiiiiiiiiMMMMMMMMMMMMMMMMMMMMMoooooooooooMMMMMMMMMMMMMMMMMMMMMiiiiiiiiiiiiMMMMMMMMMMMMMMMMMMMMMoooooooooooooooooooooooooMMMMMMMMMMMMMMMMMMMMMiiiiiiiiiiiiiMMMMMMMMMMMMMMMMMMMMMooooooooooooooMMMMMMMMMMMMMMMMMMMMMiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii

Philius predicted topology: oooooooooooooooooooooooooooooooooMMMMMMMMMMMMMMMMMMMMMiiiiiiiiiiiiiiMMMMMMMMMMMMMMMMMMMMMooooooooMMMMMMMMMMMMMMMMMMMMMMiiiiiiiiiMMMMMMMMMMMMMMMMMMMMMMooooooooooooooooooooooooooooMMMMMMMMMMMMMMMMMMMMMMMiiiiiiiiiMMMMMMMMMMMMMMMMMMMMMMMooooooooooMMMMMMMMMMMMMMMMMMMMMMMMiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii

PolyPhobius predicted topology: ooooooooooooooooooooooooooMMMMMMMMMMMMMMMMMMMMMMMMMMMiiiiiiiiiiiMMMMMMMMMMMMMMMMMMMMMMMoooooooooooMMMMMMMMMMMMMMMMMMMMMiiiiiiiiiiiMMMMMMMMMMMMMMMMMMMMMMMMMoooooooooooooooooooooMMMMMMMMMMMMMMMMMMMMMMMiiiiiiiiiiiMMMMMMMMMMMMMMMMMMMMMMoooooooooooMMMMMMMMMMMMMMMMMMMiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii

SCAMPI predicted topology: ooooooooooooooooooooooooooooooMMMMMMMMMMMMMMMMMMMMMiiiiiiiiiiiiiiiMMMMMMMMMMMMMMMMMMMMMooooooMMMMMMMMMMMMMMMMMMMMMiiiiiiiiiiiiiiiiiiiMMMMMMMMMMMMMMMMMMMMMooooooooooooooooooooMMMMMMMMMMMMMMMMMMMMMiiiiiiiiiiiiiiiiiMMMMMMMMMMMMMMMMMMMMMoooooooooMMMMMMMMMMMMMMMMMMMMMiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii

SPOCTOPUS predicted topology: oooooooooooooooooooooooooooMMMMMMMMMMMMMMMMMMMMMiiiiiiiiiiiiiiiiiMMMMMMMMMMMMMMMMMMMMMoooooooooooMMMMMMMMMMMMMMMMMMMMMiiiiiiiiiiiiMMMMMMMMMMMMMMMMMMMMMoooooooooooooooooooooooooMMMMMMMMMMMMMMMMMMMMMiiiiiiiiiiiiiMMMMMMMMMMMMMMMMMMMMMooooooooooooooMMMMMMMMMMMMMMMMMMMMMiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii

And CCTOP just tells me if it TM or not, which I already know, given that I found the list of TM proteins from Compartments, I can find a similar list of genes using biomaRt as well. I might only go ahead with biomaRt to get genes matching GO terms like extracellular matrix and transmembrane. Thanks for the help though!

0
Entering edit mode

o := outside i := inside M := membrane

CCTOP gives you a similar prediction in the image it makes. These annotations are important to estimate the size and orientation of domains.

0
Entering edit mode

Yes I understand that but my problem is interpretation of the output. Firstly, the output is not very easy to parse. Secondly, as far as I could check, all the query proteins have some i, o and m regions. I just wanted to find if there are TMM proteins that have (or don't have) an extracellular domain. Also I believe this tool is mainly for predicting if there is a TMM domain in your query sequence (which not necessarily means extracellular). The results can be accessed here - http://topcons.cbr.su.se/pred/result/rst_xXfRMl/.