Question

What could be a good strategy to describe GO enrichments for a pubblication?

0

Entering edit mode

3.2 years ago

Firingam ▴ 30

Hi guys, I am studying M.musculus proteins using a certain operational classification that allows me to divide the entire proteome into 3 categories. Using g: profiler I observed that the GO terms relating to BP, MF and CC have almost no intersections in the three categories therefore, calling them cat1,2,3, cat1 has GO terms different from cat2, from cat3 and so on. I said to myself for a publication it is not enough to draw some barplots showing these things so I created GO vectors by merging, for each GO class, the terms related to each protein class (outerjoin). So now I have 3 nx3 matrices, with many zeros given the GO terms of one category are not contained in the others. I normalized these matrices by columns, so each of them will contain 3 unit vectors. After that I applied a PCA using biplots to show which GO term was the most influential (of course, each GO term is associated with a number corresponding to the number of proteins playing that role divided by the size of the proteome). My question is this: is this approach biologically significant? I also applied a clustering approach with the seqlinkage function by bootstrapping the distance matrix a thousand times. However, given the lack of intersection between the GO terms I mentioned above, does it have biological significance to measure the distance between these vectors since, vulgarly speaking, they are on different worlds based on what I am seeing from the GO data? The PCA and the biplots I can understand them because they show which variable has more weight in the three groups (arrows) but the seqlinkage I don't know. I accept advice!

GO • 456 views

ADD COMMENT • link 3.2 years ago by Firingam ▴ 30