Question: Choose a good GO analysis
0
gravatar for benoahb
18 months ago by
benoahb40
benoahb40 wrote:

I have a few sets of ±100 proteins identified by MS as up/down-regulated under a few conditions. I would like to offer a generalist overview of functions and such.... My go-to would be a very generalist pie chart (biological process and maybe molecular function) and a short precise description (pathway/protein class) to add in a table.

So, for what I've seen, the best options are PANTHER or DAVID. How to choose? My main interest is in BP/pathways but I also consider MF/protein class

On PANTHER,

  • when I look at the list, there are proteins missing a "protein class", other a "biological process", sometimes PC gives a more accurate description, sometimes it's BP.

  • when I look at the pie of BP, "immune response" and "response to stimulus" are separated while it ends up being the "same thing" (for my set of proteins), but on the side we have a "cellular process" which is way too generalist.

  • And, to cite an example I stumbled on, OAS's PC is "defense/immunity protein" and BP is "response to stimulus" (http://pantherdb.org/genes/gene.do?acc=HUMAN|HGNC=8088|UniProtKB=Q9Y6K5). As far as I know, OAS should be part of the BP "immune response".

On DAVID, I get something somewhat similar to PANTHER "statistical surrepresentation".

  • if I choose GO > BP direct, i have only a list of function that are somewhat redundant, based on the same core of proteins

  • if I choose GO > BP all, I end up with an endless list, too precise, again quite redundant... which is just not "showable" on a pie chart.

From here, where to go?

  • pick a bunch of accurate functions/pathways which are statistically identified allowing a mild overlap of functions to be representative of my sample, but not showing every variations of a few overlapping pathways

  • and discard the under-representated functions/pathways (under a threshold of pvalue / number of protein involved) into a generalist "diverse" tag

But then, it ends up being all manipulated by the user. As objective as my choices can be... Where's the science in that...

go • 885 views
ADD COMMENTlink modified 18 months ago • written 18 months ago by benoahb40
1

Have you tried ClueGO from Cytoscape?

ADD REPLYlink written 18 months ago by Lila M 390

Still waiting my "couple of days" to get a licence :)

But the preview seem quite interesting, thanks !

ADD REPLYlink written 18 months ago by benoahb40
1

But then, it ends up being all manipulated by the user. As objective as my choices can be... Where's the science in that...

Very true. Worse, there are many more tools who do kinda the same, but just a bit different. Or use a different database.
In the end people will just fish out the result that best confirms their initial hypothesis.

But I cannot imagine that this GO analysis is an endpoint of your work. What would be the next step?

ADD REPLYlink written 18 months ago by WouterDeCoster32k

Sadly, it doesn't go much much further than that. Due to money issue we can't have wet lab follow up.

The endpoint is a comparison of the different strong fold change hits, functional pie charts and networks builds in our different conditions. Hence I don't pick the 1st functional pie chart I see and actually look a bit more into it than usual...

ADD REPLYlink written 18 months ago by benoahb40
1
gravatar for Jean-Karim Heriche
18 months ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche16k wrote:

If you're interested in summarizing your gene list with GO terms, you could select a relevant set of terms of the appropriate specificity (e.g. cell cycle, protein secretion ... and add an "other processes" category) then find out which genes are annotated with these terms or their children in GO. If you want a pie chart, then you need non-overlapping counts. For this, you can order the terms by importance/relevance to your study and count a gene only in the first category it falls in.

ADD COMMENTlink written 18 months ago by Jean-Karim Heriche16k

Ok, that sounds like a good way to go, thank you for your input!

Are you suggesting the use of a tool I am not aware of? ... or to get my hands dirty and do it one by one?

Sorry, I'm still learning how this whole thing works....

ADD REPLYlink written 18 months ago by benoahb40

The term selection is manual in so far that you hand select appropriate terms. Then I would write a script to collect the annotations given a gene list as input. The main difficulty resides in navigating the ontology e.g. if you picked cell cycle as a relevant category and you have a gene annotated as involved in the G2/M transition, you want to be able to say that this is a child term of the cell cycle term. I would suggest using the R Bioconductor GO.db package for this.

ADD REPLYlink written 18 months ago by Jean-Karim Heriche16k

Ok. Then the issue is that I haven't had the chance to learn how to use R yet and I have a limited time.

Is there any way around R? Beside using the pie chart obtained with PANTHER?

ADD REPLYlink written 18 months ago by benoahb40
0
gravatar for benoahb
18 months ago by
benoahb40
benoahb40 wrote:

Ok, so I found what I was looking for in another thread: DAVID's clustering function !

ADD COMMENTlink written 18 months ago by benoahb40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 628 users visited in the last hour