Question

Code for drawing a histogram with binned real gene transcription counts from an RNA-Seq experiment data and fit to negative binomial/normal/poission distribution curve

0

Entering edit mode

10.6 years ago

pmanga ▴ 60

Hi,

I have gene transcription counts from an RNA seq experiment and I wanted to draw a histogram with these and fit it to negative binomial/normal/Poisson distribution curve. I want to get some comparisons like these http://www.plosone.org/article/slideshow.action?uri=info:doi/10.1371/journal.pone.0016327&imageURI=info:doi/10.1371/journal.pone.0016327.g001

Can someone help me with the code for this in R?

R • 6.3k views

ADD COMMENT • link updated 3.4 years ago by Ram 45k • written 10.6 years ago by pmanga ▴ 60

0

Entering edit mode

updated link: http://dx.doi.org/10.1371/journal.pone.0016327

ADD REPLY • link 8.6 years ago by ek • 0

score 0 · Answer 1 · 2014-12-03

0

Entering edit mode

10.6 years ago

Devon Ryan 105k

Use ggplot2 with the geom_histogram() function and just set binwidth to whatever you like (this is likely how that figure was made). For fitting, see the R functions glm.nb() and such.

ADD COMMENT • link 10.6 years ago by Devon Ryan 105k

0

Entering edit mode

Thanks..that helped!

ADD REPLY • link 10.6 years ago by pmanga ▴ 60

Ram · Answer 2 · 2014-12-03

I think the general strategy should be that you find the set of parameters for you chosen model that best fits the data, for example via maximum likelihood. Then for plotting you plug these parameters in your model to get the corresponding values at a grid of points.

In a R these could be done like this:

require(MASS)
set.seed(2)
dat<- rnbinom(1000, mu = 5, size = 3) ## Obs data points
par.nb<- fitdistr(dat, "negative binomial")
par.pois<- fitdistr(dat, "Poisson")

at<- min(dat):max(dat)
p.pois<- dpois(at, par.pois$estimate)
p.nb<- dnbinom(at, size= par.nb$estimate[1], mu= par.nb$estimate[2])

hist(dat, breaks= 25, freq= FALSE, col= 'grey80',
    border= 'white', ylim= c(0, 0.2))
lines(names(table(dat)), table(dat)/length(dat), lwd= 2, col= 'grey30')
lines(at, p.pois, lwd= 2, col= 'blue')
lines(at, p.nb, lwd= 2, col= 'red')
legend('topright', lwd=2, cex= 0.8, col= c('grey30', 'red', 'blue'),
    legend= c('Obs.', 'Fitted neg. binom.', 'Fitted Poisson'))