Dichotomizing Gene Expression Data Based On The Median
2
0
Entering edit mode
11.2 years ago
moranr ▴ 290

Hi,
I have an expression matrix : gse14814.gcrma, i have normalised and filtered this.

current_study <- gse14814.gcrma[gse14814_probes,] These are the genes that i wish to use, along with there expression values.

I have created an empty matrix di_matrix with the dimensions of the above expression matrix(90 rows, by 1567 genes/cols), i fill this in with dichotomized values based on the expression. So for each gene I want to calculate the median expression for that column/gene. Then for each array for each gene if the gene is above the median assign a 1 to the same position in di_matrix, if lower assign 0 to the same location on the di_matrix.

So I think I should create a for loop:

rownames(di_matrix) <- sampleNames(current_study)
colnames(di_matrix) <- featureNames(current_study)

 for (i in 1:1567) {             
                medianVal <-  median(exprs(current_study[,i]))
                current_logical <- exprs(current_study[,i])  > medianVAL
                current_di_gene <- as.numeric(current_logical)
                di_matrix[,i] <- current_di_gene
                    }

This is wrong , its giving me back

Error in gse14814dimatrix[, i] <- currentdigene : number of items to replace is not a multiple of replacement length

Im sorry, I dont have a lot of experience in R, im very much a beginner.

Thanks for the help, R

bioconductor microarray r • 3.7k views
ADD COMMENT
1
Entering edit mode

Try to use apply functions instead of loops. http://www.ats.ucla.edu/stat/r/library/advanced_function_r.htm

ADD REPLY
1
Entering edit mode

Just a comment here since I think the answers are going to get you where you need to go. If you find yourself using a "for" loop over rows or columns, you should look for an "apply" that fits your needs instead. Using an "apply" can sometimes be orders-of-magnitude faster for the same result.

ADD REPLY
0
Entering edit mode

Ill test this in the morning, thanks for the advice, really appreciate it guys

ADD REPLY
5
Entering edit mode
11.2 years ago
zx8754 11k

Try this:

#create dummy data
r <- 90
c <- 1567
di_matrix <- matrix(round(runif(r * c, 1, 100)), ncol = c)

#get median per gene
genes_median <- apply(di_matrix, 2, median)

#convert to 0 and 1
di_matrix01 <- ifelse(di_matrix > genes_median, 1, 0)
ADD COMMENT
2
Entering edit mode
11.2 years ago
fo3c ▴ 450

If to get the median you need to call exprs, don't you also need it in the comparison that fails? current_logical <- exprs(current_study[,i]) > medianVAL

ADD COMMENT
0
Entering edit mode

Yes thank you, I forgot to put to this in, this still doesnt work though. Edited appropriately.

ADD REPLY

Login before adding your answer.

Traffic: 2158 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6