Question: Discriminating ortholog and paralogs using Cd-hit
Dear all I have 40 genomes of bacterial and I need to identify ortholog and paralog genes along with core, dispensable and unique genes. I have used CD hit to make homologous gene clusters and calculate the genes in every cluster. 1- what to do for calculating core gene, accessory genes and unique genes. 2- How can I identify paralogs and orthologs genes? 3. clusters having only representative (*) gene with no alignment (>90% identity) can be considered as unique? Please help.

What organism are you working with?

CD-HIT alone cannot tell you all of these things. You’d be best off using a pan/core genome pipeline such as roary or OrthoMCL etc.

ok.. let me try these tools.. thank you so much

