I have data that contains the occurrences of genes in different lineages:

```
Lineage Gene Gene_function
1 x regulatory proteins
1 p cell wall
1 y conserved hypotheticals
1 x respiration
1 z respiration
2 w cell wall
2 a cell wall
2 y regulatory proteins
3 b respiration
3 x conserved hypotheticals
3 a regulatory proteins
3 b regulatory proteins
3 z conserved hypotheticals
3 a respiration
```

How do I test if there are a significantly different number of, say, "cell wall" genes between all the lineages (I'm thinking equivalent to a classic ANOVA, followed by Tukey test to identify which specific lineages are different). *N.b.* there are a different number of rows for each lineage.

Then repeat this for each of the types of genes.

Is there a simple and quick way to do this in R?

Thanks, so I tried this but there were a couple of problems, but I think I found a solution. I put the data in a table using

`table()`

. When I run fisher.test() I get an error`FEXACT error 6. LDKEY=617 is too small for this problem...`

. Something to do with memory. I subsetted the table and the maximum matrix it works with is 2x3 (or 3x2). I did find on another forum however that`chisq.test(<table>, simulate.p.value = T)`

is 'equivalent' to Fisher's exact test. However,thenI found that`fisher.test()`

alsohas this argument. Indeed, they produce similar results. I'm not sure however what this parameter means! ThanksAccording to the documentation, the

fisher.testmethod only applies thesimulate.p.valueparameter in larger than 2 by 2 tables (a logical indicating whether to compute p-values by Monte Carlo simulation, in larger than 2 by 2 tables). I am not an statistician, but I assume that this parameter speeds up the p-value calculation in such tables by simulating them instead of doing an empirical (and more accurate) calculation: