Hi Everyone
I'm planning to conduct a genome-wide ortholog search for a couple of species that has its genome sequenced and currently have predicted gene annotations (ie. have predicted CDS). My plan was to do a reciprocal blast hit type of analysis and going through each CDS of one species and compare it to the other species.There's no central database to find out ortholog information for my species of interest.
Anyways for the first step I was planning to edit the CDS file so that genes that have multiple predicted transcript my plan was to remove all but the longest sequence (to limit the redundancy in later BLAST searches). I'm familiar with perl scripting however I'm trying to force myself to do bioperl scripting (since it probably would help later downstream with BLAST) and I was wondering if people had some suggestions on how the script might work in my case?
Thank you!
Thanks for the reply. The biostar reference you linked had some bits and pieces that I think would help me in the future.