I'd like to better understand the mathematical background behind the many R packages used for RNA-seq analysis, such as DESeq2 and edgeR, but my background in statistics is not extensive and I am struggling to understand all of the concepts introduced. So far I have read the main papers which introduce these packages, and also a handful of papers they reference. I've also went through the Statistics for Genomic Data Science resource for added information about generalised linear models. I understand that a regression model is fitted for each feature, and that usually a negative-binomial is used to account for dispersion. Also, because the number of replicates is usually low, the dispersion parameter is estimated from information shared between all features. Fold change is then measured using the predicted counts for each condition and significance inferred by a likelihood ratio test. Whilst in theory I believe I understand this (please correct me if I'm wrong), I'd really like to be able to implement generally the main steps in R to get a more hands-on appreciation of what's happening. Ideally, I'm looking for tutorials which aim to analyse RNA-seq or similar genomic count data in R using regression models which don't gloss over the details by simply calling the functions from these packages. Or alternatively, recommendations for biostatistics books which have a section on regression models for count data with examples in R.
Question: Resources to understand regression models for count data
4.0 years ago by
James Ashmore • 2.8k
UK/Edinburgh/MRC Centre for Regenerative Medicine
James Ashmore • 2.8k wrote:
ADD COMMENT • link •
Please log in to add an answer.
Powered by Biostar version 2.3.0
Traffic: 1662 users visited in the last hour