Question: edgeR - normalized counts from multiple datasets
0
gravatar for teabonng
15 days ago by
teabonng10
teabonng10 wrote:

Hi,

I am combining multiple independent rnaseq datasets from the same species but different conditions and analyzing them together. I normalized the counts per dataset using calcnormfactors in edger. I will have to do a second normalization when the different datasets are combined. I rounded up the normalized counts. Is it valid to use these rounded normalized counts to another round of calcnormfactors?

edger rna-seq normalization • 223 views
ADD COMMENTlink modified 9 days ago by 15617759110 • written 15 days ago by teabonng10

Hi, Kevin, I am studying a mRNA predictor for immune-checkpoint inhibitors response across varible types of tumors.When I get different kinds(population with diffrent tumors) of RNA-seq count matrices) ,I determined to choose some HouseKeeping genes to normalize the data.What do you think of this method to preprocess the data?

ADD REPLYlink written 12 days ago by 15617759110

Oh, why do you not normalise as per the recommended methods for RNA-seq? Housekeepers are used for IHC, PCR, etc. There is no true 'housekeeper' whose expression remains stable, and I imagine that it's even less stable in tumour cells.

ADD REPLYlink written 12 days ago by Kevin Blighe21k

As I know, the methods of normalization for RNA-seq count include quantile normalization of limma-voom or edgeR.I can't confirm that these methods are suitable for RNA-seq count which will be used for the subsequent machine learning to find the predictors?

ADD REPLYlink written 12 days ago by 15617759110

For downstream machine learning, I think that logCPM counts from EdgeR or r-log counts from DESeq2 would be fine. If you can obtain log2 counts from Limma/Voom, then good too.

ADD REPLYlink written 12 days ago by Kevin Blighe21k

I have done the TMM and logCPM with code as follow:

library(edgeR)
dge<-DGEList(counts = as.matrix(exprs))
dge_TMM<-calcNormFactors(dge)#TMM
logCPM <- cpm(dge_TMM, log=TRUE, prior.count=3)#log2 transformation

But it confused me that when I compare the exprs matrix with the dge_TMM$counts,they are the same!Does the TMM process not change the exprs matrix??

ADD REPLYlink modified 11 days ago by Kevin Blighe21k • written 12 days ago by 15617759110

You mean that dge_TMM$counts is the same before and after you run cpm()?

ADD REPLYlink written 11 days ago by Kevin Blighe21k

Hi,Kevin, I mean dge_TMM$counts is the same as the exprs which confused me.As I know,calcNormFactors(dge) is the process of TMM,but the dge_TMM$counts is not changed compared with the exprs

ADD REPLYlink written 9 days ago by 15617759110

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized. This comment belongs under @Kevin's answer.

ADD REPLYlink written 9 days ago by genomax50k

Hi, genomax, I want use the 'ADD REPLY', but when I click on it ,there is no response and I can only submit answer .What's more, 'ADD REPLY' button on my page is gray(I don't know which color is on your page ,but it seemed that it is invalid on my page).Could you please tell me the reason? #ps. browser that I use is firefox.

ADD REPLYlink written 9 days ago by 15617759110

We have seen this symptom in past for people using biostars from some parts of Asia. Not sure if it applies to you. ADD REPLY/ADD COMMENT buttons are indeed gray in color and are clickable for rest of us.

ADD REPLYlink modified 8 days ago • written 8 days ago by genomax50k

It works when I use the browser of chrome.You can share the experience of mine to others who does not use the browser of chrome

ADD REPLYlink written 8 days ago by 15617759110

Thanks for letting us know.

ADD REPLYlink written 8 days ago by genomax50k
2
gravatar for Kevin Blighe
13 days ago by
Kevin Blighe21k
University College London Cancer Institute
Kevin Blighe21k wrote:

You should only do one round of normalisation with all of your samples combined, and in your design formula you should include 'batch' or 'study' as a covariate. My recommendation for you is to actually use DESeq2, as I know for definite that it deals with these issues of batch and multiple experiments quite well.

Kevin

ADD COMMENTlink written 13 days ago by Kevin Blighe21k

Thank you very much for your answer and recommendation.

ADD REPLYlink written 11 days ago by teabonng10

Okay. Keep in mind that, by including 'batch' in the design model, you are only accounting for batch in the statistics. This will not modify your normalised counts.

ADD REPLYlink written 8 days ago by Kevin Blighe21k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 977 users visited in the last hour