Question: How to normalize my rna seq data?
0
gravatar for John
16 months ago by
John210
United States
John210 wrote:

Hi

I have a RNA seq datasets with three conditions (control, treatment X and Treatment Y), all triplicated. RNA sampled from brain tissue, ribosomal pulldown. I got expected counts from RSEM (STAR for alignment). I performed quantile normalization using normalizeBetweenArrays() function from Limma. I am not sure its the best way to normalize my data. You can see (image 1 )treatment Y-3 boxed area has higher gene expression than any other dataset, it looks so weird. I don't know what else I can do. Please help!

Thanks in advance!

https://i.ibb.co/d7QHxTW/Screen-Shot-2019-09-24-at-7-52-23-PM.png enter image description here

rna-seq normalization • 1.6k views
ADD COMMENTlink modified 16 months ago • written 16 months ago by John210
1

Hi, Google Drive is not a recommended host for images as it doesn't support embedding on biostars. Could I trouble you to please follow this guide and upload on imgbb?

ADD REPLYlink modified 16 months ago by ATpoint44k • written 16 months ago by _r_am32k
1

I made the changes for you already. You have to use the image button and paste in the full link to the image including the suffix (.png or similar). In this case the link would be https://i.ibb.co/d7QHxTW/Screen-Shot-2019-09-24-at-7-52-23-PM.png

ADD REPLYlink written 16 months ago by ATpoint44k
1

Thank you, I am trying to add another box plot image

ADD REPLYlink written 16 months ago by John210
1

So you have RNA-seq, and you use normalizeBetweenArrays()? RNA-seq requires a different analysis than a microarray. Please follow a well-tested tutorial, like this one from bioconductor.

ADD REPLYlink written 16 months ago by WouterDeCoster45k

You could use voom normalization from limma, and add the quantile normalization in there with argument normalize.method = "quantile". However, start with real counts, derived from featureCounts instead of RSEM.

ADD REPLYlink modified 16 months ago • written 16 months ago by Benn8.0k
3
gravatar for ATpoint
16 months ago by
ATpoint44k
ATpoint44k wrote:

Hi Jon,

as WouterDeCoster says QN might be possible for RNA-seq but is not common. I suggest reading the manuals of e.g. edgeR and DESeq2 to learn about normalization. Aditionally check the videos linked below which nicely explain the normalization techniques that are part of the differential pipeline of these two tools. Beyond that DESeq2 offers two functions, vst and rlog that not only normalize counts with respect to library size and composition but also try to unlock the variance dependency from the mean. If these vocabulary are new to you search around in the web, there is plenty of forum and blog entries on normalization and RNA-seq available. I suggest you use one of the mentioned packages for differential analysis (normalization will be taken care of internally) and vst for everything else (e.g. clustering/PCA). Note that both rlog and vst return log2 scaled counts, check the manuals and vignettes.

In order to check normalization efficiency I would also not use Z-scored heatmaps. They are rather uninformative on that matter. Instead use MA-plots (e.g. via the smoothScatter function in R to get areas colored by density or heatscatter from LSD) and then check if the bulk of the data centers somewhat along y=0.

ADD COMMENTlink modified 16 months ago • written 16 months ago by ATpoint44k
2

Agree on this, sounds completely incorrect to use an array normalization method in sequencing data.

ADD REPLYlink written 16 months ago by JC12k

I am not aware of any DEG method that actually uses it be default. One should check if data fulfills the assumptions to use QN, e.g. via quantro ( https://bioconductor.org/packages/release/bioc/vignettes/quantro/inst/doc/quantro.html ). Anyway, I would not bother with it as the standard methods are well-accepted (TMM/RLE from edgeR/DESeq2)

ADD REPLYlink modified 16 months ago • written 16 months ago by ATpoint44k

Thank you! I knew that edgeR does normalization inside it. So I used edgeR for differential expression analysis. But I wan to show the DE genes in a heatmap, for that How can I normalize for that? (I prefer Z-score of normalized counts as it is he common in all the publication)

ADD REPLYlink written 16 months ago by John210
1

Make edgeR return the normalized counts (cpm function which can directly output log2), and then transform to Z-scale, e.g. t(scale(t(norm.count.matrix))). If you use edgeR for DEG I think it is best to use its normalized counts to keep things consistent.

ADD REPLYlink modified 16 months ago • written 16 months ago by ATpoint44k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2064 users visited in the last hour
_