Question

Protein Expression Data, Clustering, And Go

2

Entering edit mode

10.4 years ago

syntonic.c ▴ 20

Hello everyone,

I am very new to GO analysis and would like to know the best way to display my data visually and the best tools to use for the job. I have searched through some of the topics here already and they have helped me figure out some things and give me some context but I am still quite confused so your patience is appreciated.

My data: Gene expression data from mass spec and microarray (~400 unique UNIPROT_ID). There are three different experimental conditions for the data and in each one there is an experimental and control sample. I have normalized the data in each of these three lists to their respective controls and log2 transformed the data. So my data is essentially three gene lists for each conditions and an log2 expression value for each gene represented in each condtion.

What I want: For each experimental condition, I would like to see what GO terms the different gene names cluster in. So if there are a set of genes associated with a particular disease ontology (especially important!), biochemical pathway, or cellular function, they would cluster together in a dendogram or some kind of tree and the associated expression level for that gene in that cluster could be seen. I also want to compare how a certain gene changes expression between the different experimental conditions which brings me to the next point.

The problem: I had thought that a GO-clustered heat map would be what I wanted but the problem is that since not all genes appear in each experimental condition, if I were to have the full gene list clustered and display a heat map for all 3 conditions, I would get a lot of empty cells when the gene does not appear in one of the conditions. I have never seen heat maps presented this way so surely this is not the best way to show all three conditions against each other. What would be the best way to present my data? The alternative seems to be to have three separate heat maps for each condition but I think it will be difficult to compare them (visually) when they are side-by-side.

GO Tools: My PI has expressed interest in using DAVID for clustering because of the large number of paper citations it has. But if there is a better choice for my purposes I would be open to suggestions. I suspect that I need to use the DAVID functional clustering tool and then prepare the output for some kind of software that can read/interpret the clustering as a dendogram and integrate my expression values for each gene in the clusters. However, I don't even know the names of things to search for to answer my question so this is making things very difficult. Furthermore, when I search with my gene list in DAVID, nothing clusters in disease ontology and this is what I am most interested in. I have attempted to prepare DAVID data for Java TreeView but I am having difficulty preparing it in a format that TreeView finds acceptable.

Main questions: 1. Best way to present my data visually? 2. Best tools to use to accomplish this task?

Any sort of basic direction would be greatly appreciated, I feel very lost and I don't really know where to begin with all of this. Thank you!

go clustering • 4.7k views

ADD COMMENT • link updated 10.3 years ago by jackuser1979 ▴ 890 • written 10.4 years ago by syntonic.c ▴ 20

score 2 · Answer 1 · 2013-12-16

2

Entering edit mode

10.3 years ago

jackuser1979 ▴ 890

If heatmap does not help in your data, you must explore for other ways to visually represent for eg. volcanic plot or three way venn diagram. If your genes is absent in all three samples apply filtering approach by deleting that genes in your data and compare. Try to produce your graphical representation using R. The workflow provided below will be helpful for your. PS: Please put your question as your comment, not as an answer.

enter image description here

ADD COMMENT • link 10.3 years ago by jackuser1979 ▴ 890

0

Entering edit mode

Thank you! I'll return to this here in a little while once I get familiar with other ways to represent my data. I know a bit of R but not enough to do much with it so that page you linked (as well as the corresponding flowchart on that page) will be very helpful to me. And sorry about putting my last comment as an answer instead, my mistake.

Thank you all for your time.

ADD REPLY • link 10.3 years ago by syntonic.c ▴ 20

score 1 · Answer 2 · 2013-12-13

1

Entering edit mode

10.4 years ago

jackuser1979 ▴ 890

Try cytoscape software with clusterMaker & Vistaclara plugin

ADD COMMENT • link 10.4 years ago by jackuser1979 ▴ 890

score 0 · Answer 3 · 2013-12-14

0

Entering edit mode

10.4 years ago

Pappu ★ 2.1k

Enrichment Map plugin in cytoscape also makes nice network plots: http://baderlab.org/Software/EnrichmentMap

ADD COMMENT • link 10.4 years ago by Pappu ★ 2.1k

score 0 · Answer 4 · 2013-12-16

Thank you both for your help. I spent the better part of today getting used to using Cytoscape and some plugins. I have used some of the sample datasets given in tutorials and have the software working for me.

I am still a bit confused about how to properly visualize my data when comparing genes shared between experimental conditions. I mentioned in my first post that I have 3 different gene lists with expression data. There is overlap between the gene lists for some genes and so I would like to compare these lists wherever a gene is shared between the three gene lists. So I can see that in Condition 1 Gene A went UP, Condition 2 Gene A went DOWN. In Condition 3, we have nothing for Gene A since it did not appear in this condition.

For the features in ClusterMaker and VistaClara, the heat maps has a bunch of empty white spots in it if I combine all of genes from all conditions into one list since some gene won't show up with any data in some columns.

For EnrichmentMap, the visualization is obviously different than a heat map. If I have a combined list of genes for all three conditions, there will be some empty cells in my TXT file. Will this affect the analysis in any way? An empty cell is not the same thing as a zero expression change on the log2 -/+ scale - it doesn't even exist in the condition at all.

OR, would it be better to make three separate plot networks for each of my conditions and compare them separately by looking for genes in common?

Do my questions make sense? I might be overcomplicating this. Thank you all very much for your help, it is greatly appreciated!