How To Merge Multiple Stress Condition In Gene Expression Microarray To 1 Label?
2
1
Entering edit mode
13.3 years ago
Ron ▴ 40

I'm using Gasch dataset (http://genome-www.stanford.edu/yeast_stress/data/rawdata/complete_dataset.txt) which has multiple stress conditions labels (e.g Heat Shock 10 minutes hs-1, Heat Shock 15 minutes hs-1, Heat Shock 20 minutes hs-1,...) How to merge all of those labels into 1 label of only Heat Shock? ..or..can I use the dataset from GEO GSE18 and how?

microarray gene dataset • 2.9k views
ADD COMMENT
0
Entering edit mode

You first need to define what you mean by "merging" the labels into one.

ADD REPLY
0
Entering edit mode

As i mention before, e.g the heat shock stress condition has a different label for 10, 15, 20 minutes etc. how may I sort of combine all those labels into only general 1 heat shock label..so that i only have to analyze the general label such as heat shock, DTT, Menadione and so on...

ADD REPLY
4
Entering edit mode
13.3 years ago
Michael 54k

Well, let's assume this requirement makes sense (I wouldn't question it without knowing the background of what you are trying to do, while I would imho try retain the maximum information possible) And I don't think your question was so totally unclear, btw. First, let me paraphrase your question, in case I got you wrong:

You have gene expression measures under certain experimental variables in an n x m matrix, where n rows correspond to genes and m columns correspond to variables/samples. Now you wish to reduce the number of columns into a single representative measurement or generally a matrix with m' < m. This is clearly a dimension reduction problem. There are very many approaches for this:

Simple approach: replace the measurements for each gene by a single point estimate e.g. mean, median or even chose a single representative variable.

Better: apply Principle component analysis to identify the direction that explains most variance in the data and project all data on the first (or first few) principal component(S).

There are many more advanced methods, but I would start with the simples first and see how far I get.

Edit: It is all implemented in R as most basic functions (try the following):

For the simple functions get help with:

?mean
?median
?rowMeans # for easy application to a matrix of measurements

For PCA use either:

?princomp # uses eigen value decomposition
?prcomp # uses singular value decomposition, more accurate

To get out a matrix of projected values (repl. USArrests with your data):

prcomp(USArrests, scale = TRUE)$x # choose the PC column that suits you best
princomp(USArrests, scale=T)$scores # same as above

Make sure to also use and understand the biplot and screeplot functions on your PCA data.

All depends a bit on the way your data is formatted, so if you need more advise, post a specific question which includes your data, too.

ADD COMMENT
0
Entering edit mode

yes, that is exactly what i meant. Since this involve thousands of genes, is there any software/R package/source code that I can use for the simple approach? n thx alot..

ADD REPLY
0
Entering edit mode

yes, there is in R, of course, it's all in the recommended stats package.

ADD REPLY
0
Entering edit mode
13.3 years ago

For the simple approach (that Michael describes above) you could load the data into Excel then use the AVERAGE (or other function) over the columns that you need to create a new column.

ADD COMMENT

Login before adding your answer.

Traffic: 1527 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6