DESeq2: How do I evaluate PCA before and after setting covariates in design matrix?
1
0
Entering edit mode
20 months ago
sr41489 • 0

I have a heterogeneous mix of data ranging in quality, read depth, etc. (the libraries were prepared by many groups and now I'm trying to analyze them all together). After initially running the DESeq method, I obtained all the PCs, focused on PC 1-10, and then ran a Pearson correlation including the numeric variables in my overall meta dataframe. I found that the following metrics correlated highly with the top 10 PCs: RNA.Batch, ExpressionProfilingEfficiency, UniqueRateofMapped, ReadLength, DV200, and AvgSplitsperRead. I set these as covariates in my design matrix:

dds <- DESeqDataSetFromMatrix(countData=data, 
                              colData=meta, 
                              design=~RNA.Batch 
                              +~ExpressionProfilingEfficiency 
                              +~UniqueRateofMapped
                              +~ReadLength
                              +~DV200
                              +~AvgSplitsperRead,
                              tidy=TRUE)

How can I compare the PCA before and after setting these covariates? I'm sorry for the very basic question, but as I understand it, setting these covariates will help to control against any variability caused by these metrics given I have lots of samples prepared from a variety of groups. Any guidance on this would be greatly appreciated, as I'm trying to ensure I can trust the data going into differential expression analysis. Thank you so much!

EDIT: I should add, I tried to re-run a PCA after setting the covariates, but I'm seeing the same values in my PCs (no change compared to before adding these covariates). Here is the code I used to generate a PC dataframe:

rld <- vst(dds, blind=TRUE)
rld_mat <- assay(rld)
rv <- rowVars(assay(rld))
ntop = 500
select_var <- order(rv, decreasing=TRUE)[seq_len(min(ntop, length(rv)))]
pca <- prcomp(t(assay(rld)[select_var,]))
summary(pca)
df <- cbind(meta, pca$x)
DESeq2 QC covariates PCA • 737 views
ADD COMMENT
3
Entering edit mode
20 months ago
ATpoint 81k

You can do as described in the manual, removing / regressing the effect of the covariate(s) from the counts and then repeat PCA. Not discussing that something like read length or unique rate of mapping is quite unlikely to contribute to any meaningful PC separation. If you seek feedback on that please add plots proving that these factors drive PC separation. Make sure that any of that is not confounded with anything more evident such experimental factors.

https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#why-after-vst-are-there-still-batches-in-the-pca-plot

Please read the vignette in details, 99.9% of common questions are addressed in there and for the rest be sure to search google first as over the last 10 years hundreds of qualty-answered questions have accumulated both here at biostars and over at support.bioconductor.com where the developer is outstandingly responsive.

ADD COMMENT
0
Entering edit mode

Thanks again for your help, and my apologies again for asking these basic questions (I'm very new to this and don't have a formal education in bioinformatics, but plan to enter a graduate program in this in the next year). With that, I guess I'm trying to find a thorough tutorial on how to troubleshoot when dealing with lower quality, very heterogeneous data. Even when I follow the instructions on the link provided, I'm unable to reduce the weight of PC1. I'll calculate the ICC values for my categorical metrics and see if perhaps those are correlating with the top PCs, but I'm stuck on these troubleshooting/cleaning steps at the moment. Anyway, thanks again, I'm hoping to get a better understanding of this with more practice and reading through what's available.

ADD REPLY

Login before adding your answer.

Traffic: 2457 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6