Question: How to perform univariate generalized linear models on RNA-seq data?
0
gravatar for Farbod
11 months ago by
Farbod3.2k
Toronto
Farbod3.2k wrote:

Dear Biostars, Hi

I have used Trinity and its default scripts for my de novo transcriptome assembly (3 biological replication for condition 1 and 3 for condition 2), count.matrix creation (using RSEM) and edgeR (FDR=0.001) for DEG analysis.

Recently I have heard that using GLM method is suggested as an insurance of accurate differentially expressed gene data, but I do not know how to use it and what program or R script I should use and which Trinity intermediate files I should target?

I have read the description of such analysis in a paper as :

"We used univariate generalized linear models (GLM) to identify differentially expressed genes in response to each condition challenge for each species separately. Negative binomial GLMs were implemented using the “edgeR” v3.8.6 package in R v3.1.3.....For this analysis, we considered only Log 2 fold changes from the genes that were identified as being significantly differentially expressed individually by each species in the GLMs above. Nonparametric Wilcoxon rank-­sum tests were again used to compare the rank order of fold change between species for upregulated and downregulated genes separately in each treatment. "

Q: How can I use GLM for my RNA-seq analysis?

NOTE: I have heard that edgR does not use GLM by default, but DESeq2 does. I ran the same analysis using DESeq2 and all edgR results was contained in DESeq2, too. is that enough?

glm deg rna-seq sartools • 506 views
ADD COMMENTlink modified 11 months ago by h.mon24k • written 11 months ago by Farbod3.2k
1
gravatar for h.mon
11 months ago by
h.mon24k
Brazil
h.mon24k wrote:

Trinity wiki describes in great detail the steps necessary for performing transcript quantification and following with differential expression analysis. At what step specifically are you stuck?

I wouldn't say edgeR "defaults" to not using GLMs. Its user guide describes two workflows, one called "classic" (for one-factor designs) and another called "glm functionality" (for more complex designs). One has to actively choose one of these, so there is no hidden "default". Now, Trinity read-made script run_DE_analysis.pl --method edgeR does indeed defaults to the "classic" test, with a poorly documented option to use the glm functionality. To use it, you have to use run_DE_analysis.pl --method GLM and also need a samples file (and you should consider it may be incomplete and / or buggy, and that is the reason it is poorly documented).

The edgeR users guide arguments the "classic" is better to one-factor designs, which seems to be your case, so why not use it?

Why are you quoting a paper performing between-species comparison? Is that your design? After within-species tests, do you intend to perform between-species comparison?

ADD COMMENTlink modified 11 months ago • written 11 months ago by h.mon24k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 690 users visited in the last hour