I'm using DESEq2 package for obtaining normalized data from raw read counts of RNA-Seq data. I'm not interested to do any downstream analysis like DEGs identification and just want the normalized and transformed expression matrix. In the steps, I also incorporated the following for any batch effect.
Mnemiopsis_count_data = read.table(file = "COUNT_DATA.txt", header = T, sep = "\t") Mnemiopsis_col_data = read.table(file = "COL_DATA.txt", header = T, sep = "\t") boxplot(Mnemiopsis_count_data) hist(Mnemiopsis_count_data[,1]) # Plotting only the first sample (column 1) pseudoCount = log2(Mnemiopsis_count_data + 1) boxplot(pseudoCount) hist(pseudoCount[,1]) dds = DESeqDataSetFromMatrix(countData = Mnemiopsis_count_data, colData = Mnemiopsis_col_data, design = ~ condition) # we're testing for the different condidtions rld = rlogTransformation(dds) par( mfrow = c( 1, 2 ) ) plot(log2( 1 + counts(dds)[ , 1:2] ), pch=16, cex=0.3, main = "log2") plot(assay(rld)[ , 1:2],pch=16, cex=0.3, main = "rlog") dds = estimateSizeFactors(dds) sizeFactors(dds) plotPCA(rld, intgroup=c("condition")) plot(pseudoCount[,4], pseudoCount[,7], pch = 20, xlab = "Aboral 4", ylab = "Oral 3") + abline(0,1, col = "red") vsd <- vst(dds) assay(vsd) <- limma::removeBatchEffect(assay(vsd), vsd$batch) plotPCA(vsd) dds = DESeq(dds) res <- results(dds) mcols(res, use.names=TRUE) summary(res) norm_counts = counts(dds, normalized = TRUE) # Extract the normalized counts
I have some queries regarding batch correction:
1) Does the vsd object contains our normalized expression matrix and how can I save it in R? 2) If I'm using rlog instead of vst for transformation, then also do I need a batch correction? 3) Does batch correction changes our normalized and transformed expression matrix? If yes, what does it do? 4) Does the above script indicates the batch effects or does it corrects them also? Please share R codes if any modifications are required. 5) Also, do I have to increment the normalized data by +1 before saving?