Question: Zero-inflated negative binomial distribution in R

0

ma23 •

**20**wrote:Hi!

I have a row with data looking like this:

```
GeneID srr1 srr2 srr3 srr4 srr5 srr6 srr7 srr8 srr9 srr10 srr11 srr12 srr13 srr14 srr15 srr16 srr17
ENSG00000223972 0 1 0 1 0 0 0 0 1 0 0 0 1 1 0 0 0
```

I want to check it for the zero-inflated negative binomial distribution. So I want to use the function zeroinfl from the pscl package. The function requires a formula as its first parametr.

```
zeroinfl(formula = , data = my_vector, dist = "negbin", EM = TRUE)
```

What should I put in the formula ?

What your data is and what you're trying to achieve are unclear.

First, note that zeroinfl builds a regression model. This doesn't tell you if the zero-inflated negative binomial is the best distribution to model your data if that's what you want. For this you would need to build models with other distributions and compare them.

Second, you need to clarify what the variables are that you want to use to model the counts.

In the zero-inflated negative binomial model, the occurrence of 0 is assumed caused by two different processes. So the model has to have two parts: one that models the counts and a part that models which of the two processes is associated with the excess 0s. The formula would look like counts ~ variables to model the counts | variables to model the processes.

20kOk, thank you. About my data - it present the gene (ENSG00000223972) expression over a number of people - they are named srr1.... The initial idea was to check whether the expression may be discribed by the negative binomial or Poisson distribution.

20