I need to do some statistical analyses of my transcriptomes.
I have a database with 4 columns (gene ID (categoric), expression level (numeric), individual, species). I have 2 different species and 5 ind per species. Per each individual I have more than 20000 genes (some of them are more expressed than others).
What I want to know is whether is there differences between the expression level between species. The distribution of my data doesn't follow a Gaussian distribution.
For analysing my data I run:
wilcox.test(Exp~Species, data =data)
Wilcoxon rank sum test with continuity correction data: Exp by Species W = 8573700000, p-value < 2.2e-16 alternative hypothesis: true location shift is not equal to 0
According to these result there should be a significan difference in the expression level between species.BUT:
- I am not sure if the analyses are appropiate for this dataset.
- Is there any way where I can take into account (as a random factor) the ID gene?
Thank you so much in advance