I am trying to reproduce some analysis from a paper and I'm having some problems with it. The paper clusters 9 microarray datasets by averaging normalised pairwise Pearson correlations for the individual datasets. The approach is not particularly well described in the paper, but it uses the Sleipnir package (https://libsleipnir.bitbucket.io/index.html) and I have managed to figure out most of it.
The bit where I am having problems now is the final step. After building the network as above, the paper uses the MCluster tool from Sleipnir to perform agglomerative hierarchical clustering on the network and then cuts the tree at a given percentile of the normalised co-expression values to get clusters.
I ran MCluster ok. This writes a .cdt file to stdout, which I re-directed to a file, and also optionally creates a .gtr. The documentation suggests these should be compatible with Java TreeView. I have two problems with this:
- I can't find a tool that will allow me to cut a .gtr formatted tree and extract the clusters. Nothing is named in the paper, and I haven't found anything through searches - any suggestions?
- In an attempt to bypass this block, I attempted to convert my .gtr file to an hclust object in R using xcluster2r from the ctc Bioconductor package (https://www.bioconductor.org/packages/release/bioc/html/ctc.html) to cut the tree that way. This gave me an error out of the box:
xcluster2r("~/combined/combined.gtr", distance="pearson") Error in if (abs(data1[i]) > abs(data2[i])) { : missing value where TRUE/FALSE needed
I think there may be some difference between the .gtr file produced by Sleipnir and that produced by Cluster as in addition to not being compatible with xcluster2r, the .gtr file is also not parsed correctly by Java TreeView. I have never used Cluster so I don't know for sure. I have pasted the head of my .gtr file below:
NODE0 GENE1321 GENE2574 1
NODE1 GENE1203 GENE2598 0.990178
NODE2 NODE0 GENE3005 0.86515
NODE3 NODE1 NODE2 0.81465
NODE4 GENE1320 GENE1585 0.785924
NODE5 GENE2572 NODE3 0.785902
NODE6 GENE3546 GENE4914 0.777941
NODE7 NODE5 GENE4088 0.753848
NODE8 GENE3535 NODE7 0.733575
NODE9 NODE8 NODE6 0.722749
So the basic question I have is: does anyone have any experience with this file format?
More specifically:
- Can anyone see any particular problems with my .gtr file?
- Does anyone know how I can cut a tree in this format, either natively or by converting it to a format where I can use other tools to cut it (e.g., an R hclust object)?
(Note, I am happy to share the file and the associated .cdt file privately if that would help).
Thanks