Choose subset of protein sequences to maximize diversity
0
0
Entering edit mode
8.2 years ago
r.follador ▴ 90

This is probably more of a CS question than biology:

Given a set of m protein sequences I want to select n candidates out of this set (n is a given number), which maximize the diversity.

I would probably start with a distance matrix (made by clustalo). Now I want to choose my n candidates in this way, that the total sum of distance of each candidate to every other candidate is maximized.

The goal is to get a subset of the m protein sequences, which is still more or less representative in terms of the diversity.

What approach would you suggest?

diversity sequence • 1.3k views
ADD COMMENT

Login before adding your answer.

Traffic: 1973 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6