Entering edit mode

4.4 years ago

mhyunjunkang
▴
90

Dear all,

This may be basic question. But I would like to know or hear some comments and advice about DE analytic tools. FYI, I have very limited knowledge in this field. I keep studying, though.

I would like to know which DE analytic tool is the best to identify DE genes in RNA-seq data. Especially, I would like to know advantage, weak point, and difference between DE analysis based on negative binomial model and based on Bayesian empirical approach.

I'm not sure whether I can entirely understand your detail expertise. But I can at least figure out starting point.

Thanks in advance. HJ

Just a quick comment: DESeq2 already includes an empirical Bayesian regression model with negative binomial family for the purposes of modelling and dealing with (adjusting for) dispersion. I believe it uses a Bayesian approach for the log2 fold change shrinkage too, which helps to deal with biased fold-changes at low counts. Take a look at my answer here: A: Clarification on how DSEeq2 Dispersion Curve is Generated

With regard to modelling RNA-seq as a negative binomial, it was shown that this resulted in less false positive associations than modelling it as a Poisson: Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

Your question will likely attract much opinion.

I may link here an article from Mike Love (author of DESeq2) on the question if DESeq2 or edgeR (or a different method) is the gold standard for differential count analysis.

Not just DESeq2, but edgeR and limma also have empirical Bayes components (e.g., variance shrinkage).

Thank you all for all of the comments and expertise. I have been reading Mike Love's paper (DESeq2). I am still going back to the paper.

One question is how different some tools like EBSeq are from DESeq2. As I know EBSeq is also modeling RNA-seq as a negative binomial. Actually, I think that it is beta-negative binomial.

On face value, they look quite similar in that they both assume a negative binomial distribution and do adjustments that help to adequately manage dispersion / variance. They also adjust for library size with size factors. The main difference may come in how they judge what is differentially expressed or not:

notuse the Wald test. They appear to calculate posterior probabilities for each transcript and then gauge statistical significance in relation to differential expression via this metricIt would also help to read EBSeq: an empirical Bayes hierarchical model for inference in RNA-seq experiments

Thank you for the detail explanation. One small thing that makes me confused, though. As I understood, log link is used in DESeq2, not logit. Did I misunderstand? Again, thank you for your expertise. It really helps. HJ

Correct, a logit link wouldn't make sense in RNAseq, which is why it's not used.