Question

Normalisation of paired samples in edgeR

0

Entering edit mode

3.6 years ago

Tota ▴ 20

I am performing differential expression of 10 paired samples (cancer and normal tissue) in edgeR and I'm following '3.4.1 Paired samples' in the Bioconductor User's Guide.

Do the library sizes need to be normalised prior to testing for treatment efftect?

Normalised with:

y <- calcNormFactors(y)

Estimating dispersion, fitting to a linear model and testing for treatment effect.

y <- estimateDisp(y,design)
fit <- glmQLFit(y, design)
qlf <- glmQLFTest(fit)
topTags(qlf)

I don't get any differentially expressed genes after I normalise, but if I omit normalisation I get differentially expressed genes.

RNA-Seq edgeR • 1.5k views

ADD COMMENT • link 3.6 years ago by Tota ▴ 20

1

Entering edit mode

Normalization is independent from the experimental design, and yes it needs to be performed. Whatever results you get without norm. is not meaningful. You might incorporate the FilterByExpr filter as recommended in the manual.

Do you start from raw counts as edgeR expects? How does your design and the groups look? Did you check for batch effects using PCA on the logCPMs? A plot I find most useful is the MA-plot, so plotting logCPM on the x- and logFC on the y-axis. This both shows whether normalization is proper (most points should center along y = 0, and how the fold changes behave, so whether there are simply no large FCs or whether the large FCs are simply not significant). In the latter case the volcano plot is another useful type of plot for results exploration.

ADD REPLY • link 3.6 years ago by ATpoint 81k

0

Entering edit mode

Great thank you, I'll include normalisation.

Yes I do start with raw counts.

I have a filtering section:

keep <- rowSums(cpm(y)>0.5) >=2

My groups looks like this:

      files  group    lib.size                 norm.factors  subjects
a_1   1.csv control  16065685            1.8069450     patient1
a_2   2.csv control  4740572              2.1098124     patient2
a_3   3.csv control  19853317            1.8273974     patient3
a_4   4.csv cancer  22955672            0.8591707     patient1
a_5   5.csv cancer  38906433            0.6714201     patient2
a_6   6.csv cancer  21069541            1.2216965     patient

My design looks like this:

   design <- model.matrix(~0+subjects+group)

I haven't checked for batch effects, I will give it a go.

ADD REPLY • link 3.6 years ago by Tota ▴ 20

1

Entering edit mode

See the lib.sizes, the cancers are sequenced much deeper than the controls, this is one of the reasons why normalization is necessary. The counts in cancer are propably much higher simply because of that, and you have to correct for, details here. Try to do the PCA first, e.g. using the PCAtools package from Bioconductor or simply using the plotMDS function from edgeR/limma which implements a very similar technique. This will also tell you how well samples cluster together which can be a proxy on the dispersion between replicates. Since you start from csv files, may I ask how you obtained the counts?

ADD REPLY • link 3.6 years ago by ATpoint 81k

0

Entering edit mode

I did PCA using plotMDS on the raw data and on the normalised data. I have 'Leading logFC dim2' along the y-axis and 'Leading logFC dim1' along the x-axis. For the raw data the samples cluster along y=0, and after normalisation the samples are more dispersed.

I'm working with circRNAs and I used two detection methods and merged on commonly detected circRNAs using a script which output to .csv files. Raw

raw https://ibb.co/gRsyyRr

Normalised

after_normalisation https://ibb.co/XtR9Gr5

ADD REPLY • link 3.6 years ago by Tota ▴ 20

1

Entering edit mode

PCA should be done on the log2-transformed normalized data. I am currently putting together a little tutorial on basic QC including PCA and MA-plots for DNA/RNA-seq, probably goes online early next week that covers the basics with example code. Maybe this clarifies some things.

ADD REPLY • link 3.6 years ago by ATpoint 81k

0

Entering edit mode

Awesome, that'll be of great help, where will the tutorial be available?

ADD REPLY • link 3.6 years ago by Tota ▴ 20

1

Entering edit mode

Will post it here on biostars.

ADD REPLY • link 3.6 years ago by ATpoint 81k

0

Entering edit mode

Also, was it correct for me to to the PCA on the normalised data?

ADD REPLY • link 3.6 years ago by Tota ▴ 20

1

Entering edit mode

Normalized + log2:

Basic normalization, batch correction and visualization of RNA-seq data

ADD REPLY • link 3.6 years ago by ATpoint 81k

0

Entering edit mode

Great, thank you. Just saw this. I'll give this a go.

ADD REPLY • link 3.6 years ago by Tota ▴ 20