Hello! I am currently analyzing some data that was provided to me. It consists of more than 150 microarray samples from a single kind of tumor. The data is already normalized and background corrected, and in theory I should be able to find differences in gene expression among patients that live more than the others under chemotherapy treatment and those that live less under the same treatment.
The thing is that I can't seem to find grouping based on the median survival (high vs low) or even survival divided in terciles (again high vs low) when in theory there should be differences. The differential expression analysis with limma does not find any DEGs after correcting with fdr. After filtering for most variable genes, I have tried adjusting for hidden batch effects with SVA, and correcting for relevant clinical variables (age, performance status, volume of disease) with no improvement.
I fitted adjusted cox ph models with each gene but none were related to survival after adjusting for multiple comparisons.
I have also tried k-means clustering with two groups, and they correlate with survival in the adjusted Cox model, which gets me differentially expresses genes, but I am not sure this approach is correct.
I do not understand how with such a big sample size, previous research backing my assumptions and supposedly quality data I get no results in the differential expression analysis.
I have performed GSEA with the ranked genes to get significantly expresses pathways, but I do not get significant results after adjusting for multiple corrections.
Any idea where I might be going wrong? Thank you all!