Question

How to perform univariate generalized linear models on RNA-seq data?

0

Entering edit mode

6.0 years ago

Farbod ★ 3.4k

Dear Biostars, Hi

I have used Trinity and its default scripts for my de novo transcriptome assembly (3 biological replication for condition 1 and 3 for condition 2), count.matrix creation (using RSEM) and edgeR (FDR=0.001) for DEG analysis.

Recently I have heard that using GLM method is suggested as an insurance of accurate differentially expressed gene data, but I do not know how to use it and what program or R script I should use and which Trinity intermediate files I should target?

I have read the description of such analysis in a paper as :

"We used univariate generalized linear models (GLM) to identify differentially expressed genes in response to each condition challenge for each species separately. Negative binomial GLMs were implemented using the “edgeR” v3.8.6 package in R v3.1.3.....For this analysis, we considered only Log 2 fold changes from the genes that were identified as being significantly differentially expressed individually by each species in the GLMs above. Nonparametric Wilcoxon rank-sum tests were again used to compare the rank order of fold change between species for upregulated and downregulated genes separately in each treatment. "

Q: How can I use GLM for my RNA-seq analysis?

NOTE: I have heard that edgR does not use GLM by default, but DESeq2 does. I ran the same analysis using DESeq2 and all edgR results was contained in DESeq2, too. is that enough?

RNA-Seq DEG GLM SARTools • 2.4k views

ADD COMMENT • link updated 6.0 years ago by h.mon 35k • written 6.0 years ago by Farbod ★ 3.4k

score 1 · Answer 1 · 2018-05-01

Trinity wiki describes in great detail the steps necessary for performing transcript quantification and following with differential expression analysis. At what step specifically are you stuck?

I wouldn't say edgeR "defaults" to not using GLMs. Its user guide describes two workflows, one called "classic" (for one-factor designs) and another called "glm functionality" (for more complex designs). One has to actively choose one of these, so there is no hidden "default". Now, Trinity read-made script run_DE_analysis.pl --method edgeR does indeed defaults to the "classic" test, with a poorly documented option to use the glm functionality. To use it, you have to use run_DE_analysis.pl --method GLM and also need a samples file (and you should consider it may be incomplete and / or buggy, and that is the reason it is poorly documented).

The edgeR users guide arguments the "classic" is better to one-factor designs, which seems to be your case, so why not use it?

Why are you quoting a paper performing between-species comparison? Is that your design? After within-species tests, do you intend to perform between-species comparison?