Diverse sampling from MSA / phylogenetic tree?
0
0
Entering edit mode
11 months ago
Nick ▴ 40

I have a multiple sequence alignment from protein sequences form mmseqs. I wonder if there are any methods that would allow me randomly sample from that space but with the constraint to be somewhat representative for some property (e.g. sequence homology in the database). I could of course cluster the sequences that wouldn't le me allow exact control over the number of sequences I want to have from the sampling.

I can think of many ways to do this manually pairwise alignments and then clustering with kmeans, computing embeddings + clustering, maybe phylogenetic tree computation + sampling.

Is there some "gold standard" method? I fail to find something.

clustering proteins sampling alignment msa • 417 views
ADD COMMENT
0
Entering edit mode

I had a somewhat similar question last year, though I never came up with a solution I was totally satisfied with. Maybe that could give you ideas, though. What's your end goal with the sampling approach?

ADD REPLY

Login before adding your answer.

Traffic: 2721 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6