Entering edit mode
3.2 years ago
mtsompana
▴
10
I have removed batch effects in my RNA-seq data with the following script:
dds <- DESeqDataSetFromMatrix(countData = cts,
colData = coldata,
design = ~ Batch + Condition3)
dds <- dds[ rowSums(counts(dds)) > 10, ]
dds <- DESeq(dds)
rld<-rlog(dds,blind = FALSE)
assay(rld) <- limma::removeBatchEffect(assay(rld), rld$Batch)
I want to generate a heat map with the top 5000 highly expressed genes. I don't know how to select these genes from the new data with no batch effects.
Specifically in
heatmap.2(assay(rld)[select,], col = "heat.colors", Rowv = FALSE, Colv = FALSE, scale = "none", dendrogram = "none", trace = "none", margin = c (10, 10))
*I am not sure how to define the [select] *
thank you!
The code from rpolicastro is what you need. I would suggest to use
rowVars
rather thanrowMeans
though as you want probably genes that separate samples well, and these are the variable genes. Just taking the most expressed genes is most likely not going to be meaningful as the top-expressed genes usually differ very little (housekeeping genes etc.rowVars
selects genes based on rowwise variance, the same as theplotPCA
function from DESeq2 does internally. 5000 is a lot though, mabe a bit less would be better, 500 or 1000. 5000 is about 1/3 of what one detects in a normal RNA-seq experiment.Thank you so much! I am grateful for your help and time.
Please use the formatting bar (especially the
code
option) to present your post better. You can use backticks for inline code (`text` becomestext
), or select a chunk of text and use the highlighted button to format it as a code block. If your code has long lines with a single command, break those lines into multiple lines with proper escape sequences so they're easier to read and still run when copy-pasted. I've done it for you this time.