protein clustering and pangenome tools
1
0
Entering edit mode
2.0 years ago

Hi all!

I'm working on subspecies sequences now, and have questions on selecting protein clustering/pangenome anlaysis tool.

I understand that Roary and Panaroo use CD-HIT and BLAST for first clustering to collapse highly similar proteins into one so that minimizing redundancy of data, and use MCL for clustering using pairwise similarity matrix created by BLAST, and finally give me a gene_presence_absence table.

But I don't understand that why some tools (e.g PPanGGoLiN) use only MMSeqs2. Difference between MMSeqs2 and CD-HIT is alignment-free or not, so that is the major advantage using MMSeqs2 is to minimize computing power and time?

If not, what can be the standard to choose clustering/pangenome tool?

clustering pangenome • 1.2k views
ADD COMMENT
0
Entering edit mode

I think you should read this paper: link

ADD REPLY
1
Entering edit mode
2.0 years ago
Mensur Dlakic ★ 28k

MMSeqs2 has both search and clustering capabilities, so it can replace the other two tools. I have used both CD-HIT and MMSeqs2, and still do. They produce different clustering solutions at low identity thresholds (say, 40% and below), but should be very similar for higher identity threshold that is typically used for pangenomes. Don't know which one works better in this specific application, but I think you would be fine with either tool as both are well-known and have been thoroughly tested.

ADD COMMENT

Login before adding your answer.

Traffic: 1699 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6