I want to compare the output of de novo assemblies of multiple samples. From this, I'd like to cluster the samples on (dis)similarity.
With bla(s)t, I get per-sequences scores (which I could use to get a percentage of similar bases between the query and database). With CD-HIT (EST), I do get clusters, but still no score/percentage.
Does anybody have a more straightforward solution for this?