Hierarchical tree based on phylogenetic profile matrix (presence/absence per species)?
1
0
Entering edit mode
9.9 years ago
a1ultima ▴ 840

I have a matrix that represents presence/absence of some character (rows) in a list of species (columns), e.g.

     species 1  species 2  species3
1    0          0          1
2    0          1          1
3    0          1          0
4    0          0          0
5    0          0          0
6    0          0          0
7    0          1          0
8    0          1          0
9    0          0          1
10   0          0          0

Is there a way I that can process this into a hierarchical tree such that similar rows group closer together?

In order of preference, I would hope that the solution comes in either:

  • A python/R script
  • A python/R package that I can make a script from
  • Linux command-line software
  • Webtool
Phylogenetic-profile Python Biopython R Tree • 4.1k views
ADD COMMENT
3
Entering edit mode
9.9 years ago
David W 4.9k

You can do it all in base R, using dist and hclust:

fake_pa <- t(replicate(10, rbinom(10, 1, 0.1)))
head(fake_pa)

#     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#[1,]    1    0    0    0    0    0    1    0    0     0
#[2,]    1    0    0    0    0    0    0    0    1     0
#[3,]    0    0    1    0    0    0    0    0    0     0
#[4,]    0    0    0    1    0    0    0    0    0     1
#[5,]    0    0    0    1    0    0    0    0    0     0
#[6,]    0    0    0    0    0    0    0    0    1     0

dm <- dist(fake_pa, method="manhattan")
plot(hclust(dm))

As you may know, there have been many pages spent on the question of the "best" distance and clustering methods for binary data. You might want to check out some of the functions in ADE4 which impliments different methods.

ADD COMMENT
0
Entering edit mode

that was brill thanks!

ADD REPLY

Login before adding your answer.

Traffic: 1516 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6