Dear colleagues!
I want to find nearest genes for genes encoding orthologues of my protease of interest. To find all known representatives of the protease family, I do plastp using refseq_protein database. The resulting hit proteins have accession «WP_xxxxxxxxx.1» (approx. 1000 from different bacteria and fungi). A minor fraction of the hits also have parallel «YP_» accession. For this minor fraction I can find neighbor genes through NCBI Gene database using «gene2accession» and «geneneighbor» datasets. But for the proteins that have only «WP» accession I can only get genomic «NZ» accession (using release66.AutonomousProtein2Genomic
dataset) but I can not find a dataset containing summarized information about nearest genes.
Are there any datasets containing the information I need or maybe one can propose a different approach to retrieve the nearest genes for «WP»-annotated proteins?
Thank you!