kmer analysis for sequences of behavior

1

Entering edit mode

4.8 years ago

truebeliever24 ▴ 50

Hi all,

I have been trying to use kmer analysis (using k=3) to identify phenotypes of behavioral sequences.

For example, if each letter is a behavior within a courtship display, I could have the following:

Species A: R R R R R S H E
Species B: P P P P P P A S H E
Hybrid 1: P P P P R R E
Hybrid 2: R R R R R P E

The idea is for the kmer to be able to separate all individuals into species A, B, and various hybrid phenotypes based on the sequences they perform. It has actually done a very good job separating the parent species and intermediate hybrids, but seemingly backcrossed hybrids (i.e., act like Species A, but do a single behavior that Species B does) are often placed incorrectly with Species A).

I've tried to find ways to weigh characters or eliminate repetitive 3mers to try and avoid biasing the analysis, but I haven't been able to do so.

Is anyone familiar with kmer analysis? If so, do you have any suggestions?

R • 734 views

ADD COMMENT • link updated 4.8 years ago by zx8754 11k • written 4.8 years ago by truebeliever24 ▴ 50