I was recently researching for methods on constructing tumor phylogenetic trees by processing SNP frequencies from multiple samples and inferring viable models of subclonal decomposition. That approach applies to individual patients regardless of their sample size. Recently I ran across a paper that draws comparisons between different patients and assigns multiple numbers on every branch/transition ... (Figure 1D and 5B) http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3864404/ I've searched the online methods but haven't quite figured out how to replicate their method.
The full details are there in the appendix:
10 Phylogenetic analysis
Phylogenetic trees were generated based on three different genomic events: mutations data, copy number and compound copy number events. For mutation data, a consolidated matrix containing the mutations of all samples (rows) with ‘1’ and ‘0’ representing the presence and absence of a mutation in a gene (column), respectively, is generated. The rows of this matrix represent the samples and columns represent the genes. For copy number, the matrix consisted of segment log ratio data for each patient-gene pair. Pearson correlation coefficients ρxy were computed between pairs of patients x and y. The results were used for phylogenetic analysis such that the pairwise distance of x and y was computed as 1 − ρxy. The Neighbor-Joining method of Saitou and Nei (Saitou and Nei, 1987) and the Unweighted Pair Group Method with Arithmetic Mean (UPGMA) method of clustering were used to construct the phylogenetic tree. We used ‘ape’ R package (Paradis et al., 2004) for constructing and plotting the phylogenetic trees. For compound copy number events, a similar procedure was performed with the exception of the matrix construction and computation of the distance. The matrix consisted of the weight of observing compound events: 2 for amplified LOH (ALOH), 2 for copy neutral LOH (NLOH), 2 for homozygous deletion (HOMD), 1 for hemizygous deletion (HETD), and zero for diploid heterozygous (HET) and allele-specific amplification (ASCNA). Euclidean distance was computed between pairs of tumour samples. The tree construction was performed the same as before.