Hi All!
I am in the process of conducting my first genome annotation/assembly as an undergrad so i am fairly new to the field!
I ran orthofinder to compare my species proteome against a set of 28 other similarly related species (avians) for which i got the outputted orthogroups.genecount file. I just wanted to confirm my understanding of this output, these are essentially ortholog counts for each species based on respective orthogroups correct? For example lets say OG00000 is pim-1 like protein and species A has 34 while species B has 103, does that mean species B has more orthologous sequences descending from the pim-1 ancestor gene?
Secondly i ran ErmineJ to assess for enrichment of GO terms in my dataset and got the respective output file with GO terms and p-values. Now my understanding is that the p-values can be used to determine whether or not the GO terms are significantly enriched in the dataset (i.e p<0.05 then significant?) Is this true? Furthermore what benefit does this really have in the grand scheme? For example what does having a high GO term such as transferase activity in the dataset mean?
Any help is appreciated! Thank you in advance :)