how to cluster genes based on annotated function?
1
0
Entering edit mode
3.7 years ago
limchen • 0

I did pangenome analysis on around 300 bacterial genomes. Now I got the output table called gene_presence and absence.tsv. There are more than 50000 gene families in the table. can anyone recommend some tools to cluster those gene families? Otherwise, based on the current gene_presence_absence table, it is hard to do the analysis. I want to cluster those genes based on their annotated function hoping this will simplify the table, and I also want to see if these functional group will correlate with the strain groups created by their ANI values.

I know RNAseq people can run GO analysis, but mine is pangenome data. Any suggestion is welcomed?

Best, LC.

genome sequencing • 624 views
ADD COMMENT
0
Entering edit mode
3.7 years ago
Mensur Dlakic ★ 27k

A common way of clustering proteins is by their similarity. There are several ways to do that, but you may want to try BLAST comparisons. Take all the proteins, compare them in all-vs-all fashion, and extract significant E-values. There is a package called MCL that works well with BLAST results and will cluster large datasets efficiently. You may also want to try OrthoMCL.

ADD COMMENT

Login before adding your answer.

Traffic: 2549 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6