My data represents the count of mutations by gene in two groups cases and ctrl. I would like to understand if there is a significant difference between the number of mutations for the same gene between the two groups (gene-wise comparison). In short, I have to compare means grouping by two variables (gene and Type), but in my case I don’t think anova is appropriate since count data should follow a Poisson distribution.\ Here a small subset of my data, as you can see the samples size (cases and ctrls is different).
gene Samples_ID value Type AB 1 28 Cases AB 100 22 Cases AB 101 36 Cases AB 102 57 Cases AB 105 29 Cases AB 106 25 Cases AB 108 23 Cases AB 4928 18 Ctrls AB 4929 18 Ctrls AB 4930 24 Ctrls AB 4931 20 Ctrls AB 4932 25 Ctrls AB 4933 15 Ctrls AB 4934 25 Ctrls AB 4935 22 Ctrls AB 4936 30 Ctrls AB 4937 15 Ctrls AB 4938 18 Ctrls AB 4939 21 Ctrls
Where "gene" is the gene name, "Samples_ID" represent the patient, "value" is the number of mutation for the given gene, "type" is the group.\
I have tried a generalized linear model as follows using R because i found many similar questions:
fit <- glm(value ~ gene + Type, data = file, family = poisson())
but I'm not convinced it's the right way and I don't know how to proceed. My desired output would be a table with gene name as row, the p-value, adjusted p-value in the columns, in this way I can understand if for a given gene, the mean number of mutations differs in a significant way between groups (cases and ctrls).
Please, give me a simple explanation, I'm not a statistician and my background about statistic is ~ 0.
Thank you in advance.