Hello all, I am working on elucidating a biochemical pathway. What I want to do is take all genomes that have this metabolic pathway and find what genes these genomes share. The purpose is this pathway is seemingly transferred horizontally in prokaryotes and I want to find what the possible 'prerequisite genes' are for this pathway to work in new hosts such as cofactor synthesis, transport proteins, etc that may lie outside of the island. Has anyone done anything similar and figured out a good approach?
My initial thoughts were to simply examine feature tables and create a venn diagram of those sharing similiar features, but many of the genes in the island itself are poorly annotated so I need to do some form of clustering based on sequence identity/similarity.
You can use cd-hit to cluster based on similarity of sequences.
If you have several gene clustering in gbk format, and want to compare them, you can use clinker