Question: Quantify similarity between multi-fasta files
4.3 years ago
United Kingdom
Hi there,

I want to compare the output of de novo assemblies of multiple samples. From this, I'd like to cluster the samples on (dis)similarity.

With bla(s)t, I get per-sequences scores (which I could use to get a percentage of similar bases between the query and database). With CD-HIT (EST), I do get clusters, but still no score/percentage.

Does anybody have a more straightforward solution for this?

Seasons greetings,


4.3 years ago
United States
If I remember right, clustalW can give the similarity matrix between sequences.

