I have a few sets of ±100 proteins identified by MS as up/down-regulated under a few conditions. I would like to offer a generalist overview of functions and such.... My go-to would be a very generalist pie chart (biological process and maybe molecular function) and a short precise description (pathway/protein class) to add in a table.
So, for what I've seen, the best options are PANTHER or DAVID. How to choose? My main interest is in BP/pathways but I also consider MF/protein class
when I look at the list, there are proteins missing a "protein class", other a "biological process", sometimes PC gives a more accurate description, sometimes it's BP.
when I look at the pie of BP, "immune response" and "response to stimulus" are separated while it ends up being the "same thing" (for my set of proteins), but on the side we have a "cellular process" which is way too generalist.
And, to cite an example I stumbled on, OAS's PC is "defense/immunity protein" and BP is "response to stimulus" (http://pantherdb.org/genes/gene.do?acc=HUMAN|HGNC=8088|UniProtKB=Q9Y6K5). As far as I know, OAS should be part of the BP "immune response".
On DAVID, I get something somewhat similar to PANTHER "statistical surrepresentation".
if I choose GO > BP direct, i have only a list of function that are somewhat redundant, based on the same core of proteins
if I choose GO > BP all, I end up with an endless list, too precise, again quite redundant... which is just not "showable" on a pie chart.
From here, where to go?
pick a bunch of accurate functions/pathways which are statistically identified allowing a mild overlap of functions to be representative of my sample, but not showing every variations of a few overlapping pathways
and discard the under-representated functions/pathways (under a threshold of pvalue / number of protein involved) into a generalist "diverse" tag
But then, it ends up being all manipulated by the user. As objective as my choices can be... Where's the science in that...