Question

Heatmap after batch correction in DESeq2

0

Entering edit mode

3.2 years ago

mtsompana ▴ 10

I have removed batch effects in my RNA-seq data with the following script:

dds <- DESeqDataSetFromMatrix(countData = cts,
                              colData = coldata,
                              design = ~ Batch + Condition3)

dds <- dds[ rowSums(counts(dds)) > 10, ]

dds <- DESeq(dds)

rld<-rlog(dds,blind = FALSE)

assay(rld) <- limma::removeBatchEffect(assay(rld), rld$Batch)

I want to generate a heat map with the top 5000 highly expressed genes. I don't know how to select these genes from the new data with no batch effects.

Specifically in

heatmap.2(assay(rld)[select,], col = "heat.colors", Rowv = FALSE, Colv = FALSE, scale = "none", dendrogram = "none", trace = "none", margin = c (10, 10))

*I am not sure how to define the [select] *

thank you!

RNA-Seq • 1.6k views

ADD COMMENT • link updated 3.2 years ago by rpolicastro 13k • written 3.2 years ago by mtsompana ▴ 10

1

Entering edit mode

The code from rpolicastro is what you need. I would suggest to use rowVars rather than rowMeans though as you want probably genes that separate samples well, and these are the variable genes. Just taking the most expressed genes is most likely not going to be meaningful as the top-expressed genes usually differ very little (housekeeping genes etc. rowVars selects genes based on rowwise variance, the same as the plotPCA function from DESeq2 does internally. 5000 is a lot though, mabe a bit less would be better, 500 or 1000. 5000 is about 1/3 of what one detects in a normal RNA-seq experiment.

ADD REPLY • link 3.2 years ago by ATpoint 82k

0

Entering edit mode

Thank you so much! I am grateful for your help and time.

ADD REPLY • link 3.2 years ago by mtsompana ▴ 10

0

Entering edit mode

Please use the formatting bar (especially the code option) to present your post better. You can use backticks for inline code (`text` becomes text), or select a chunk of text and use the highlighted button to format it as a code block. If your code has long lines with a single command, break those lines into multiple lines with proper escape sequences so they're easier to read and still run when copy-pasted. I've done it for you this time.
code_formatting

ADD REPLY • link 3.2 years ago by Ram 43k

score 1 · Answer 1 · 2021-02-10

Example matrix.

set.seed(100)
mat <- matrix(rnorm(30, 10, 4), nrow=10)
rownames(mat) <- sprintf("ENSG%08d", sample(seq_len(10e6), 10))
colnames(mat) <- sprintf("A%s", seq_len(3))

> mat
                    A1        A2        A3
ENSG08934374  7.991231 10.359545  8.247640
ENSG09697166 10.526125 10.385098 13.056242
ENSG06987757  9.684332  9.193464 11.047845
ENSG05097048 13.547139 12.959362 13.093618
ENSG08525678 10.467885 10.493518  6.742484
ENSG07535515 11.274520  9.882733  8.246198
ENSG03028407  7.672837  8.444583  7.119114
ENSG00746419 12.858131 12.043425 10.923778
ENSG07872528  6.698962  6.344743  5.369082
ENSG08543365  8.560551 19.241187 10.988304

For this example I'll just get the top 5 genes by row mean.

top_5 <- head(mat[order(-rowMeans(mat)), ], n=5)

> top_5
                    A1        A2       A3
ENSG05097048 13.547139 12.959362 13.09362
ENSG08543365  8.560551 19.241187 10.98830
ENSG00746419 12.858131 12.043425 10.92378
ENSG09697166 10.526125 10.385098 13.05624
ENSG06987757  9.684332  9.193464 11.04785