Entering edit mode
6.7 years ago
anu014
▴
190
Hello Biostars!
Today I am dealing with Statistics. I have 3 sets of genes (from same organism) with their AT% content. Now, I want to compare these sets as to which sets are more different from each other.
I tried Type-III ANOVA. But I'm not sure it's the correct method to apply :
library("car")
df1 <- data.frame(file1$at, file1$sample) #sample is gene_set1
#eg df1
# at sample
#1 42.5852 gene_set1
#2 41.8838 gene_set1
df2 <- data.frame(file2$at, file2$sample) #sample is gene_set2
#eg df2
# at sample
#1 40.5852 gene_set2
#2 45.8838 gene_set2
df3 <- data.frame(file3$at, file3$sample) #sample is gene_set3
#eg df3
# at sample
#1 40.5852 gene_set3
#2 41.3458 gene_set3
up_unres <- rbind(df1, df2)
down_up <- rbind(df3, df1)
down_unres <- rbind(df3, df2)
Anova(lm(at ~ sample, data=up_unres), type="III")
Anova(lm(at ~ sample, data=down_up), type="III")
Anova(lm(at ~ sample, data=down_unres), type="III")
Can anyone tell me which statistical method will be precise enough for this?
Please do suggest. Thank you :)
you only have one factor, why did you choose to use type-III anova?
Because my data is unbalanced. The number of genes in each set is different.
but balance is only really a problem for factorials. Do a single anova over your three groups
anova(lm(at ~ sample, data = my_data))
, R will deal with the unequal set sizes.