Question

How To Merge Multiple Stress Condition In Gene Expression Microarray To 1 Label?

1

Entering edit mode

13.3 years ago

Ron ▴ 40

I'm using Gasch dataset (http://genome-www.stanford.edu/yeast_stress/data/rawdata/complete_dataset.txt) which has multiple stress conditions labels (e.g Heat Shock 10 minutes hs-1, Heat Shock 15 minutes hs-1, Heat Shock 20 minutes hs-1,...) How to merge all of those labels into 1 label of only Heat Shock? ..or..can I use the dataset from GEO GSE18 and how?

microarray gene dataset • 2.9k views

ADD COMMENT • link updated 10.8 years ago by Biostar 20 • written 13.3 years ago by Ron ▴ 40

0

Entering edit mode

You first need to define what you mean by "merging" the labels into one.

ADD REPLY • link 13.3 years ago by Istvan Albert 100k

0

Entering edit mode

As i mention before, e.g the heat shock stress condition has a different label for 10, 15, 20 minutes etc. how may I sort of combine all those labels into only general 1 heat shock label..so that i only have to analyze the general label such as heat shock, DTT, Menadione and so on...

ADD REPLY • link 13.3 years ago by Ron ▴ 40

Ram · Answer 1 · 2011-01-06

Well, let's assume this requirement makes sense (I wouldn't question it without knowing the background of what you are trying to do, while I would imho try retain the maximum information possible) And I don't think your question was so totally unclear, btw. First, let me paraphrase your question, in case I got you wrong:

You have gene expression measures under certain experimental variables in an n x m matrix, where n rows correspond to genes and m columns correspond to variables/samples. Now you wish to reduce the number of columns into a single representative measurement or generally a matrix with m' < m. This is clearly a dimension reduction problem. There are very many approaches for this:

Simple approach: replace the measurements for each gene by a single point estimate e.g. mean, median or even chose a single representative variable.

Better: apply Principle component analysis to identify the direction that explains most variance in the data and project all data on the first (or first few) principal component(S).

There are many more advanced methods, but I would start with the simples first and see how far I get.

Edit: It is all implemented in R as most basic functions (try the following):

For the simple functions get help with:

?mean
?median
?rowMeans # for easy application to a matrix of measurements

For PCA use either:

?princomp # uses eigen value decomposition
?prcomp # uses singular value decomposition, more accurate

To get out a matrix of projected values (repl. USArrests with your data):

prcomp(USArrests, scale = TRUE)$x # choose the PC column that suits you best
princomp(USArrests, scale=T)$scores # same as above

Make sure to also use and understand the biplot and screeplot functions on your PCA data.

All depends a bit on the way your data is formatted, so if you need more advise, post a specific question which includes your data, too.

score 0 · Answer 2 · 2011-01-06

0

Entering edit mode

13.3 years ago

Istvan Albert 100k

For the simple approach (that Michael describes above) you could load the data into Excel then use the AVERAGE (or other function) over the columns that you need to create a new column.

ADD COMMENT • link 13.3 years ago by Istvan Albert 100k