Entering edit mode
4.7 years ago
Tota
▴
20
I am performing differential expression of 10 paired samples (cancer and normal tissue) in edgeR and I'm following '3.4.1 Paired samples' in the Bioconductor User's Guide.
Do the library sizes need to be normalised prior to testing for treatment efftect?
Normalised with:
y <- calcNormFactors(y)
Estimating dispersion, fitting to a linear model and testing for treatment effect.
y <- estimateDisp(y,design)
fit <- glmQLFit(y, design)
qlf <- glmQLFTest(fit)
topTags(qlf)
I don't get any differentially expressed genes after I normalise, but if I omit normalisation I get differentially expressed genes.
Normalization is independent from the experimental design, and yes it needs to be performed. Whatever results you get without norm. is not meaningful. You might incorporate the
FilterByExpr
filter as recommended in the manual.Do you start from raw counts as edgeR expects? How does your design and the groups look? Did you check for batch effects using PCA on the logCPMs? A plot I find most useful is the MA-plot, so plotting logCPM on the x- and logFC on the y-axis. This both shows whether normalization is proper (most points should center along y = 0, and how the fold changes behave, so whether there are simply no large FCs or whether the large FCs are simply not significant). In the latter case the volcano plot is another useful type of plot for results exploration.
Great thank you, I'll include normalisation.
Yes I do start with raw counts.
I have a filtering section:
My groups looks like this:
My design looks like this:
I haven't checked for batch effects, I will give it a go.
See the lib.sizes, the cancers are sequenced much deeper than the controls, this is one of the reasons why normalization is necessary. The counts in cancer are propably much higher simply because of that, and you have to correct for, details here. Try to do the PCA first, e.g. using the
PCAtools
package from Bioconductor or simply using theplotMDS
function from edgeR/limma which implements a very similar technique. This will also tell you how well samples cluster together which can be a proxy on the dispersion between replicates. Since you start from csv files, may I ask how you obtained the counts?I did PCA using
plotMDS
on the raw data and on the normalised data. I have 'Leading logFC dim2' along the y-axis and 'Leading logFC dim1' along the x-axis. For the raw data the samples cluster along y=0, and after normalisation the samples are more dispersed.I'm working with circRNAs and I used two detection methods and merged on commonly detected circRNAs using a script which output to .csv files. Raw
Normalised
PCA should be done on the log2-transformed normalized data. I am currently putting together a little tutorial on basic QC including PCA and MA-plots for DNA/RNA-seq, probably goes online early next week that covers the basics with example code. Maybe this clarifies some things.
Awesome, that'll be of great help, where will the tutorial be available?
Will post it here on biostars.
Also, was it correct for me to to the PCA on the normalised data?
Normalized + log2:
Basic normalization, batch correction and visualization of RNA-seq data
Great, thank you. Just saw this. I'll give this a go.