(I also asked this question on Bioinformatics Stack Exchange. Apologies if cross-posting is frowned upon)
I have been trying to obtain some preliminary data from HyPhy selection analyses to inform a larger project. I have obtained a number of assembled mammalian genomes from NCBI with the initial goal of extracting from each of them their corresponding sequence for a specific gene. I wanted to use HyPhy to determine if there are signatures of positive selection in any of the mammalian lineages for that specific gene. However, I have been struggling to make headway with this for some time and am realizing that I may be out of my depth.
I was hoping that people could suggest, or at least point me in the direction of, a pipeline/guide/best practices for how one would go about achieving what I am trying to do. I would like to start from scratch and make sure I am performing my analyses in a proper way, using advice from other users.
As I mentioned, I currently have a collection of assembled genomes from NCBI which I would like to pull specific genes from. My understanding is that I will then need to extract the exons (I imagine there is a way to do this without having to extract the entire gene first), align the sequences from each species while preserving reading frames, and then use that alignment as input in HyPhy's codon models. It is very possible that I am missing out some fundamental steps in the process.
If anybody could offer direction/advice, I would be very appreciative.