I'm investigating the potential effect of several mutations on gene expression. I've RNA-seq from ~30 tumors and 4 control samples. I screened for these 30 tumors about 20 genes for mutations. I translated these mutation data into a binary matrix (rows = samples, columns = genes ; 0 = not mutated, 1 = mutated).
For the RNA-Seq I aligned with STAR and counted the number of reads per gene with featurecounts. Then I merged everything into a big read count matrix (column = sample ; rows=gene).
Now I want to detect genes which expression is affected by these mutations. So I used limma as follow:
The design matrix harbors a column for each gene (~20), and an additional column indicating if the samples is a tumor or not. I also add an offset column. For example :
mut dataframe :
offset geneA geneB geneC isTumor sample1 1 0 1 1 1 sample2 1 1 1 0 1 sample3 1 0 0 1 0
design <- mut design <- design[,colSums(design)>2] # remove gene that is mutated in less than 3 samples design <- cbind(offset=1,design) # add offset # egdeR dge <- DGEList(counts=counts) dge <- calcNormFactors(dge) cpm <- cpm(dge,log=T) keep.exprs <- rowSums(cpm>1)>=3 # min 3 samples with CPM > 1 dge <- dge[keep.exprs,] # limma voom geneExpr <- voom(dge,design) glm = lmFit(geneExpr$E, design = design) glm = eBayes(glm) testResults <- decideTests(glm, method="global",adjust.method="BH", p.value=0.05)[,-1]
Now in testResults I've a matrix containing for each gene and each covariate (the 20 screened genes for mutation) if it's up (=1) or down regulated (=-1) ; (no significant change = 0). Is that correct for you ?
edit : I posted the question to bioconductor forum as adviced by Goutham Atla. https://support.bioconductor.org/p/86092/