3.3 years ago by
Barcelona
I would recommend you to input your raw counts into DESeq2, run the pipeline, convert normalized reads to rlog (regularized log transformed reads) and then just run the plotPCA function from DESeq2. It is very easy if you are familiarized with the program.
# Creating deseq2 object
dds <- DESeqDataSetFromMatrix(countData = inputData,
colData = samples,
design = design)
dds <- DESeq(dds, betaPrior = betaPrior)
# Regularized log transformation for different analysis (clustering, heatmaps, etc)
rld <- rlogTransformation(dds)
pca <- plotPCA(rld, intgroup = c(colGroups))
The idea behind using rlog transformation for Quality Control checks is described in DESeq2 paper: "[...] Therefore, we use the shrinkage approach of DESeq2 to implement a regularized logarithm transformation (rlog), which behaves similarly to a log2 transformation for genes with high counts, while shrinking together the values for different samples for genes with low counts. It therefore avoids a commonly observed property of the standard logarithm transformation, the spreading apart of data for genes with low counts, where random noise is likely to dominate any biologically meaningful signal[...]"
•
link
written
3.3 years ago by
plat • 50
I am not familiar with DESeq2, I have been using edgeR up to now. I have just been reading the manuals and online tutorials and looking at how to input the data. I see it will accept a count matrix such as the one I have in csv format, but that it needs a metadata file. I am really not sure how to make one of these or what it must contain. Can anyone advise on how I can do this?
Thanks again in advance
As an alternative to PCA you can also try MDS plots : https://www.rdocumentation.org/packages/edgeR/versions/3.14.0/topics/plotMDS.DGEList but it should give similar results
Hello, I update this topic because I have another question
rLogTransofrmation is fine and I have do this with my data but does DESeq2 deal with the difference in the number of reads between samples?
Because I have 3 samples from 2 conditions, the 3 samples from the first condition have all ~ 10 000 000 reads and the 3 others have ~13 000 000, so I don't know if the clusters on my pca come from biological difference (that I hope aha), or from difference of the number of reads.
I moved that one to a comment. Please open a new thread for such questions instead of refreshing older ones. Still, this question has been asked before, please use the search function and google. From the manual which you always should read first:
So yes, it normalizes and is a recommended transformation for downstream applications such as PCA and clustering.