Hello!
I’m new to genome annotation and am working on structural annotation for a newly assembled genome of a non-model rodent species. I have completed repeat masking but am now trying to decide on the software for structural annotation. I’m considering using GALBA for this purpose. GALBA is designed to use only protein evidence, which suits my situation since I don't have RNA-Seq data. I’m leaning towards GALBA over MAKER just because it seems simpler to set up (GALBA uses a Singularity container), and clearer to use for a beginner like me.
My main question relates to selecting the appropriate protein evidence for training GALBA. The GALBA team emphasizes the importance of using protein data from closely related species. I have found 11 proteomes from species within the same family (Cricetidae) as my organism in UniProt. I’m wondering if these are sufficiently close relatives for training GALBA.
Should I use these proteomes directly, or would it be beneficial to perform additional steps, such as constructing a phylogenetic tree to determine which proteomes are the closest relatives to my rodent species? Any guidance on the best approach to selecting and using protein evidence would be greatly appreciated.
Thank you in advance for your help!