I did pangenome analysis on around 300 bacterial genomes. Now I got the output table called gene_presence and absence.tsv. There are more than 50000 gene families in the table. can anyone recommend some tools to cluster those gene families? Otherwise, based on the current gene_presence_absence table, it is hard to do the analysis. I want to cluster those genes based on their annotated function hoping this will simplify the table, and I also want to see if these functional group will correlate with the strain groups created by their ANI values.
I know RNAseq people can run GO analysis, but mine is pangenome data. Any suggestion is welcomed?
Best, LC.