Clustering similiar genes from many genomes
2
0
Entering edit mode
5 months ago
anabaena • 0

Hello all, I am working on elucidating a biochemical pathway. What I want to do is take all genomes that have this metabolic pathway and find what genes these genomes share. The purpose is this pathway is seemingly transferred horizontally in prokaryotes and I want to find what the possible 'prerequisite genes' are for this pathway to work in new hosts such as cofactor synthesis, transport proteins, etc that may lie outside of the island. Has anyone done anything similar and figured out a good approach?

My initial thoughts were to simply examine feature tables and create a venn diagram of those sharing similiar features, but many of the genes in the island itself are poorly annotated so I need to do some form of clustering based on sequence identity/similarity.

Python genome pangenomics clustering • 225 views
1
Entering edit mode

You can use cd-hit to cluster based on similarity of sequences.

If you have several gene clustering in gbk format, and want to compare them, you can use clinker

2
Entering edit mode
5 months ago
Mensur Dlakic ★ 11k

One option is to go to STRING database and search for one of the proteins that are in that pathway. That will initially give you proteins that are interacting partners of your query, which may be enough for your purposes. You can also choose the neighborhood option and that will give you conserved clusters of genes in various organisms, which may answer the question how conserved is that protein and its neighbors across multiple genomes. You may need to start multiple times using different query sequences to get a complete picture.

0
Entering edit mode
5 months ago
Joe 19k

Depending on how closely related all the genomes are, you could use a pangenome approach like roary. If the genomes are quite diverse, you can use other pangenome tools (there are some designed for broader comparisons though none of the names spring to mind at present).