Question: DESeq2 Batch correction enquiry?
0
gravatar for prithvi.mastermind
5 months ago by
prithvi.mastermind0 wrote:

I'm using DESEq2 package for obtaining normalized data from raw read counts of RNA-Seq data. I'm not interested to do any downstream analysis like DEGs identification and just want the normalized and transformed expression matrix. In the steps, I also incorporated the following for any batch effect.

Mnemiopsis_count_data = read.table(file = "COUNT_DATA.txt", header = T, sep = "\t")
Mnemiopsis_col_data = read.table(file = "COL_DATA.txt", header = T, sep = "\t")
boxplot(Mnemiopsis_count_data)
hist(Mnemiopsis_count_data[,1]) # Plotting only the first sample (column 1)
pseudoCount = log2(Mnemiopsis_count_data + 1)
boxplot(pseudoCount)
hist(pseudoCount[,1])

dds = DESeqDataSetFromMatrix(countData = Mnemiopsis_count_data, colData = Mnemiopsis_col_data, design = ~ condition) # we're testing for the different condidtions
rld = rlogTransformation(dds)

par( mfrow = c( 1, 2 ) )
plot(log2( 1 + counts(dds)[ , 1:2] ), pch=16, cex=0.3, main = "log2")
plot(assay(rld)[ , 1:2],pch=16, cex=0.3, main = "rlog")


dds = estimateSizeFactors(dds)
sizeFactors(dds)
plotPCA(rld, intgroup=c("condition"))
plot(pseudoCount[,4], pseudoCount[,7], pch = 20, xlab = "Aboral 4", ylab = "Oral 3") + abline(0,1, col = "red")

vsd <- vst(dds)
assay(vsd) <- limma::removeBatchEffect(assay(vsd), vsd$batch)
plotPCA(vsd)

dds = DESeq(dds)
res <- results(dds)
mcols(res, use.names=TRUE)
summary(res)
norm_counts = counts(dds, normalized = TRUE) # Extract the normalized counts

I have some queries regarding batch correction:

1) Does the vsd object contains our normalized expression matrix and how can I save it in R? 2) If I'm using rlog instead of vst for transformation, then also do I need a batch correction? 3) Does batch correction changes our normalized and transformed expression matrix? If yes, what does it do? 4) Does the above script indicates the batch effects or does it corrects them also? Please share R codes if any modifications are required. 5) Also, do I have to increment the normalized data by +1 before saving?

Thanks

ADD COMMENTlink modified 5 months ago by ATpoint41k • written 5 months ago by prithvi.mastermind0

What is the content of vsd$batch? I cannot see that from the code. It should be a vector that indicates which sample belongs to which batch. This is somewhat related to:

4) Does the above script indicates the batch effects or does it corrects them also?

The script visualizes a potential batch via the PCA you should use, together with information from the wetlab such as the day of library prep etc. to check whether variation across datapoints and biological replicates could be due to a batch effect. The PCA after the batch correction with limma can then be used to check if this was successful. Is this your script?

ADD REPLYlink written 5 months ago by ATpoint41k

Thank you sir for your prompt response. No this is not my script. Its an example script from DESEq2 page which I ran for understanding the concept. the vsd$batch is showing NULL. what does this mean?

ADD REPLYlink written 5 months ago by prithvi.mastermind0

It does not seem that you are following the DESeq2 workflow / vignette - can you confirm which one you are following? As an example, it makes little sense to create and use your pseudoCount object, as it is just logged raw counts + 1 (?).

Which variable contains your batch information? - where is it stored?

For all intents and purposes, this is a simple workflow that you could use, assuming that batch is a variable in your colData:

# create DESeq object
# include batch in design formula so that our test statistics for condition..
# ...are controlled for batch
dds <- DESeqDataSetFromMatrix(countData = Mnemiopsis_count_data,
  colData = Mnemiopsis_col_data,
  design = ~ condition + batch)

# normalise
dds <- DESeq(dds)

# extract normalised counts
norm_counts = counts(dds, normalized = TRUE)

# regularised log transformation
rld <- rlogTransformation(dds)

# variance -stabilised transformation
vsd <- vst(dds)

# eliminate batch effect directly from vsd
assay(vsd) <- limma::removeBatchEffect(assay(vsd), colData(dds)[['batch']])
ADD REPLYlink modified 5 months ago • written 5 months ago by Kevin Blighe67k

Thank you sir for your help. But how can i save expression data that has been batch corrected after running the last command: assay(vsd) <- limma::removeBatchEffect(assay(vsd), colData(dds)[['batch']])

ADD REPLYlink written 5 months ago by prithvi.mastermind0

Take a look at the write.table() function. The expression data itself will be stored in assay(vsd)

ADD REPLYlink written 5 months ago by Kevin Blighe67k

Sir after running this step

# normalise
dds <- DESeq(dds)

How can I normalize and log-transform the data? Please help
rld <- rlogTransformation(dds) # for rlog transormation

norm_counts = counts(rld, normalized = TRUE) #for normalization

Is this step relevant?

ADD REPLYlink modified 4 months ago by genomax92k • written 4 months ago by prithvi.mastermind0

This is getting a bit beyond the scope of biostars. log2() is a basic R function, please get more background before doing analysis in R, and most importantly, please use online tutorials such as the DESeq2 manual to learn how to do RNA-seq analyssis and which steps are necessary and why. The idea of this forum is not spoon-feeding.

ADD REPLYlink written 4 months ago by ATpoint41k

I completely understand the idea of the respective forum sir, but i'm a complete beginner in R and trying to dig in the concepts. Help from your end would be a great motivation for me.

i framed this code after reading:

dds <- DESeqDataSetFromMatrix(countData = CC, colData = Col_data, design = ~ Condition)
dds = DESeq(dds)
keep <- rowSums(counts(dds)) >= 10 #keeping genes below having count above 10
dds <- dds[keep,]
dds = estimateSizeFactors(dds)
sizeFactors(dds)
vsd <- assay(vst(dds, blind=FALSE))

I have applied vst transormation. Could you please tell me if the vaues in vsd are normalized or not? or do we have to perform normalization.

ADD REPLYlink modified 4 months ago by Kevin Blighe67k • written 4 months ago by prithvi.mastermind0

In your above code, vsd will contain normalised and transformed expression levels - yes.

ADD REPLYlink written 4 months ago by Kevin Blighe67k

Thanks, Kevin. Before creating the DESeQ object i read my data like this.

Count_data <- read.table(file = "BRCA_DESeQ.txt", header = T, sep = "\t")
CC <- (Count_data[,2:1218])
Col_data = read.table(file = "COL.txt", header = T, sep = "\t")
dds = DESeqDataSetFromMatrix(countData = Count_data[,2:1218], colData = Col_data, design = ~ Condition)
dds = DESeq(dds)

where Col_data contains the sample information and Count_data contains expression data where the first column has the genes. I read the expression data without genes and assigned to variable CC. This was done because the samples in col data and count data must have the same dimensions

The vsd variable has normalized and log-transformed values without genes. Please help how can I get the genes with data also.

ADD REPLYlink modified 4 months ago by genomax92k • written 4 months ago by prithvi.mastermind0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1640 users visited in the last hour