You can use several Bioconductor packages to detect gene signatures in transcriptomic data comparing treatment to control groups. First, perform differential expression analysis to identify differentially expressed genes (DEGs), which form the basis of signatures. Recommended pipelines include DESeq2, edgeR, or limma. For DESeq2, import your data as a matrix with genes as rows and cell lines as columns, then create a DESeqDataSet:
library(DESeq2)
dds <- DESeqDataSetFromMatrix(countData = your_matrix, colData = data.frame(condition = c(rep("control", n_control), rep("treatment", n_treatment))), design = ~condition)
dds <- DESeq(dds)
res <- results(dds, contrast = c("condition", "treatment", "control"))
Filter DEGs by adjusted p-value < 0.05 and log2 fold change > 1. To detect enriched signatures, use clusterProfiler for gene set enrichment analysis (GSEA):
library(clusterProfiler)
ego <- enrichGO(gene = rownames(res[res$padj < 0.05,]), OrgDb = "org.Hs.eg.db", keyType = "SYMBOL", ont = "BP")
Alternatively, GSVA (Gene Set Variation Analysis) computes signature scores per sample:
library(GSVA)
gsva_scores <- gsva(expr = your_matrix, gset.idx.list = msigdb_sets, method = "gsva")
For robust signatures across cell lines, consider ICARus, which uses independent component analysis to extract stable patterns.
For in silico evidence of treatment efficacy when combined with a transcription factor (TF) inhibitor, integrate transcriptomic data with TF activity inference. Use DoRothEA to estimate TF activities from your DEGs:
library(dorothea)
tf_activities <- run_viper(input = res, regulon = dorothea_hs, options = list(method = "none"))
Compare activities between treatment alone and simulated combination. Query databases like LINCS or Connectivity Map (via cmapR) to predict if the inhibitor reverses treatment-induced signatures. Network analysis with graphite can model pathway perturbations, assessing if the combination suppresses oncogenic TFs. Validate by checking if combined perturbation normalizes expression towards control using principal component analysis or correlation metrics.
Kevin
You can contact me biologsr[at]gmail.com