In RNA-Seq data modeling process, glmfit function in R used raw RNA-Seq count data?
1
0
Entering edit mode
2.2 years ago
mytlsdnkr • 0

Hello, I'm newbie in RNA-Seq analysis process.

When I processed a RNA-Seq analysis, there are some questions.

If you guys have a time, please let me know.

  1. In RNA-Seq data analysis (i.e DEG analysis), glmfit function in R used raw RNA-Seq count data for modelling?

I TMM normalized for correct non equivalance between samples, then I used glmfit function,

in this process, glmfit function used raw RNA-Seq count data in modelling? I knew that Count Per Millions normalization count data used in this function until now.

However, when I think about a assumption of glmfit, it's used non negative binomial based distribution.

Also, if we changed raw RNA-Seq count data to CPM normalization, then this data should follow continuous distribution.

So, I think this CPM normalized data should not use for modelling in glmfit function.

Is it true?

Thanks, for your answers.

count data glmfit raw R RNA-Seq • 1.3k views
ADD COMMENT
1
Entering edit mode
2.2 years ago
Gordon Smyth ★ 7.0k

Yes, glmFit() uses raw counts. Just type help("glmFit").

ADD COMMENT
0
Entering edit mode

Thanks for your answer!

Can I ask a one question to you?

Why do we use a CPM normalization for raw count data?

If you have a time, please leave the answer.

Thank you.

ADD REPLY
0
Entering edit mode

Why do we use a CPM normalization for raw count data?

The only purpose of computing cpms is for plotting or for input into other programs. The edgeR differential expression pipeline does not use cpms at any stage. It only uses raw counts.

ADD REPLY
0
Entering edit mode

I understand it.

Thank you!

ADD REPLY
0
Entering edit mode

Gordon may correct me on this... If glmFit works with raw counts, one may wonder what is the point of the normalization step via TMM or other methods. The answer is that the normalized library size is used as an offset in the negative binomial model inside of glmFit. An offset is a component similar to a predictor variable that, in contrast to a predictor, does not need to be estimated because you are certain that its effect is 1. I.e. you are certain that the expected effect of doubling the number of reads sequenced (the raw library size) doubles the number of reads on each gene.

I posted this before seeing the OP's comment - CPM normalization is not used for DGE - maybe my comment here answer your question?

ADD REPLY
0
Entering edit mode

Thanks dariober, I understand the using of CPM.

But, I don't completely understand offset of glmfit.

If you have a time, please detailed explanation to me? or give me a some reference.

Thank you.

ADD REPLY
0
Entering edit mode

or give me a some reference

Just type help("glmFit"). It gives you the reference.

ADD REPLY

Login before adding your answer.

Traffic: 2710 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6