I would like to identify the most conserved (and by the way, the most variable) sites in a multiple alignment of DNA genomic sequences with annotation information.
My final goal is to reconstruct the phylogenetic tree of a plant genus containing around 100 species. I have genomic data for only 15 of them (draft genomes). I selected about 1000 genes shared by the 15 and now I want to identify variable regions flanked by conserved ones in those 1000 genes. Indeed, I will then assume that those particular regions also exist in the 100 species. I’m interested in variable regions flanked by conserved ones because I will further design primers in the conserved parts and then amplify those regions for the species which have no genome sequenced. I guess the exon will be more conserved than the intron that’s why I need annotation information on the multiple alignments. For instance, it could be a multiple alignment with an annotation layer linked to a conserved profile.
So my question is: do you know software (command line ideally) that can perform this task?
Any suggestions will be appreciated.