Question: R phangorn phylogenetic analysis from somatic mutations' binary table
3.3 years ago by
Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK

I want to use a binary matrix to build a phylogenetic tree using the R package phangorn as done in this paper:Tracking the Genomic Evolution of Esophageal Adenocarcinoma through Neoadjuvant Chemotherapy

In the methods they say:

Trees were built using binary presence/absence matrices built from the regional distribution of variants within the tumor. The R Bioconductor package phangorn (1.99-7; ref. 36) was utilized to perform the parsimony ratchet method (18), generating unrooted trees. Branch lengths were determined using the acctran function.

I have the binary presence/absence matrix, however the phangorn package uses phyDat objects, which are derived from sequence alignments according to the phangorn vignettes.

My question is:

How can I use a binary table to build a phylogenetic tree with the R phangorn package?

If there is a way to read the binary matrix as a phyDat object, that would solve the problem, but I don't see how that could be done.

ADD COMMENTlink modified 8 months ago by RamRS22k • written 3.3 years ago by Alejandro Jimenez Sanchez120
3.3 years ago by
Klaus S100
Klaus S100 wrote:

Hello Alejandro,

There are generic functions as.phyDat() in phangorn to transform matrices and data.frames into phyDat objects.

For example you can read in your data with read.table() or read.csv(), but you might need to transpose your data. For matrices as.phyDat() assumes that the entries each row belongs to one individual (taxa), but for data.frame each column. For binary data you can transform these with a command like (depending how you coded them):

as.phyDat(data, type="USER", levels = c(0, 1))
as.phyDat(data, type="USER", levels = c(TRUE, FALSE))


ADD COMMENTlink modified 9 months ago by RamRS22k • written 3.3 years ago by Klaus S100
3.3 years ago by
poisonAlien2.8k wrote:

There are multiple ways to construct trees based on binary data.

  1. You can use neighbor-joining method from Phangorn for tree construction.

    Since you already have a binary matrix.

    mat.nj = nj(dist.gene(t(mat))) #neighbour joining tree construction
    plot(mat.nj, 'cladogram') #plot cladogram
    write.tree(mat.nj, 'mat.newick') #write newick tree

    This is using UPGMA method (I'm not sure you can use this one for binary data)

    mat.upgma = upgma(dist.gene(t(mat)))
  2. Use Phylip character parsimony which I think most suitable for this kinda data.

    Collapse your matrix for pars input. For example:

    4 10 #Four samples ten mutations

    And use phylip pars with outgroup root set to your germline sample (here its tumor_root, 4th sample). Setting outgroup in Phangorn is bit difficult (I'm not sure though).

  3. As Chris suggested above, you can use other sophisticated methods such as lichee, which uses vaf info to cluster and constructs trees (also divides trees based on clones).

ADD COMMENTlink modified 9 months ago by RamRS22k • written 3.3 years ago by poisonAlien2.8k
3.3 years ago by
Chris Miller20k
Washington University in St. Louis, MO
Chris Miller20k wrote:

An alternative solution, and a more typical workflow for cancer samples, would be to feed your VAF and clustering information into a package like clonevol, which does the phylogenetic inference and produces some nice visualizations. (clustering can be accomplished with a package like sciclone or pyclone).

ADD COMMENTlink modified 9 months ago by RamRS22k • written 3.3 years ago by Chris Miller20k
