I'm a bit battled since I'm re-analyzing some published data from a collaborator and I cannot get more than 10-20% of my genes overlapping with their data. I would like to make sure that I'm not screwing it up so I hope that this is the best place to ask
analysis of 20 samples, 10 paired tumor-normal
my R code (based on limma) looks like:
fileList <- list.files(".",pattern = "*.CEL$",full.names=T, include.dirs=F, recursive=F) sample <- ReadAffy(filenames = fileList, verbose=T) eset <- expresso(sample, bgcorrect.method="mas", normalize.method="quantiles", pmcorrect.method="mas", summary.method="medianpolish") as.matrix(colnames(exprs(eset))) # prepare design target <- cbind(c(rep(5,2), rep(6,2), rep(7,2), rep(8,2), rep(9,2), rep(10,2), rep(31,2),rep(36,2), rep(37,2), rep(38,2)), rep(c("D","N"),10))) colnames(target) <- c("ID", "status") rownames(target) <- colnames(exprs(eset)) target <- as.data.frame(target) paired_samples <- factor(target$ID) Treat <- factor(target$status, levels=c("D","N")) design <- model.matrix(~paired_samples+Treat) fit <- lmFit(eset, design) fit <- eBayes(fit) topTable(fit, coef="TreatN")
but the list of genes that I get not only has 0 with FDR < 5% (as stated in the paper) but have only 11 genes out of >400 that overlap if I use the test p.value instead of the adjusted p.value.
I even tried using a non-paired analysis with this design
design <- cbind(N=1,DvsN=rep(c(1,0),10)) rownames(design) <- colnames(exprs(eset))
heatmap looks prettier in dividing normal vs tumor but still same problem of overlapping my significant genes vs the published ones.
-- edit --
an example of normalized data I have is the attached image
any suggestion would be just awesome