This is my data with negative values. ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE29nnn/GSE29745/matrix/GSE29745_series_matrix.txt.gz This is one color technology. What negative values mean in one color technology?
This is my data with negative values. ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE29nnn/GSE29745/matrix/GSE29745_series_matrix.txt.gz This is one color technology. What negative values mean in one color technology?
The data in that file is already normalised and log2 transformed, so, it is quite possible that many values will be negative. However, you can still use it for, e.g., differential expression analysis. Negative values are actually common on various microarray platforms.
This code will download the same file and plot a boxplot:
library(Biobase)
library(GEOquery)
gset <- getGEO("GSE29745", GSEMatrix =TRUE, getGPL=FALSE)
if (length(gset) > 1) idx <- grep("GPL6480", attr(gset, "names")) else idx <- 1
gset <- gset[[idx]]
dev.new(width=4+dim(gset)[[2]]/5, height=6)
par(mar=c(2+round(max(nchar(sampleNames(gset)))/2),4,2,1))
title <- paste ("GSE29745", '/', annotation(gset), " selected samples", sep ='')
boxplot(exprs(gset), boxwex=0.7, notch=T, main=title, outline=FALSE, las=2)
Kevin
hi, you said negative value is normal, and can I use limma to analyse it directly, because someone else said that the negative value should be removed because of backgroud substraction and log transform
and here do you think the picture you showed is comparable or not? thanks a lot
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Is it possible to use it beside other data when I use ComBat function from SVR (Surrogate Variable Analysis) package?
I have moved your reply to become a comment. In the future, when you want to comment on an answer, click on the
ADD COMMENT
button beneath the answer.Why do you need to use ComBat? Have you evidence that there exists surrogate variables in the data?
I want to use it for a meta analysis. I have multiple data sets from multiple platforms with similar issues. And I don't know is it possible to do it with these negative values?
I would only use ComBat as a final resort. ComBat was designed for microarray data, so, I imagine that this simple issue of negative values (which is common in microarray data) is managed by ComBat.
If you are actually combining multiple datasets together, it would at least help that they are matched on condition type, lab preparatory method, and array platform. If you have different conditions, prep. methods, and different array platforms, then you are going down a long, dark route...
For differential expression, you can typically include
experiment
(orbatch
, etc) as a covariate in the design formula. This will then adjust the statistical inferences based on the cross-experiment / batch differences.If you aiming to use downstream tools, you may indeed need to directly remove batch effects.
Thanks Kevin, I have made a matrix by combining 8 multiple microarray based on common Gene Symbols and my aim is to remove the batch effects and drawing a correlation heatmap to recognize the similarities between data sets. Also, I don't need the differential expression of genes. What are the other methods to do this except ComBat?
You should generate a PCA bi-plot to see if there exists evidence of differences between the datasets. Please take a look at my PCAtools package: PCAtools: everything Principal Components Analysis (for now, you will have to install it with:
devtools::install_github('kevinblighe/PCAtools')
).You can also generate a bi-plot using base R functions: A: PCA plot from read count matrix from RNA-Seq
Remember that you should not blindly use ComBat (or other batch-adjustment methods) without any justification.