Question

Pvclust:Error In Clustering

2

Entering edit mode

10.1 years ago

Diana ▴ 910

Hi all,

I'm trying to cluster 128 genes from mRNA-seq data so as to see which genes group together based on their expression profiles across different samples. I'm using pvclust (which clusters columns) to do this. So my genes are in the columns and samples in the rows. My input file is a csv file and looks like this:

> Sample    GATA3    KMT2E    SOX10    CREB3L2    SOX5    ETV6    ATN1    ETS2
>     1    6.73609    7.59656    0.352607    17.8663    1.86339    19.2949    56.9042    11.8808
>     2    18.9784    11.8289    1.00279    34.0411    2.09856    22.2998    56.7117    16.2549
>     3    86.9037    9.12542    5.43191    9.04106    1.94622    28.1369    70.0857    43.5062
>     4    111.871    7.14345    39.377    6.45569    4.96795    58.6333    59.5696    16.3631
>     5    63.4973    13.3015    124.078    6.86142    10.1776    49.313    99.137    13.8555

and my code is:

data=read.table("pvclust_input.csv", sep=",", header=TRUE, fill=TRUE)
data_mod<-data[ ,2:128]
data_matrix<-data.matrix(data_mod)
library(pvclust)
result <- pvclust(data_matrix, method.dist="cor", method.hclust="average", nboot=1000)

When I execute this, it gives me the following error and warnings:

> result <- pvclust(test2_matrix, method.dist="cor", method.hclust="average", nboot=1000)
Bootstrap (r = 0.4)... Done.
Bootstrap (r = 0.6)... Done.
Bootstrap (r = 0.6)... Done.
Bootstrap (r = 0.8)... Done.
Bootstrap (r = 0.8)... Done.
Bootstrap (r = 1.0)... Done.
Bootstrap (r = 1.0)... Done.
Bootstrap (r = 1.2)... Done.
Bootstrap (r = 1.2)... Done.
Bootstrap (r = 1.4)... Done.
Error in solve.default(crossprod(X, X/vv)) : 
  Lapack routine dgesv: system is exactly singular: U[2,2] = 0
In addition: There were 11 warnings (use warnings() to see them)
> warnings()
Warning messages:
1: inappropriate distance matrices are omitted in computation: r =  0.4
2: inappropriate distance matrices are omitted in computation: r =  0.6
3: inappropriate distance matrices are omitted in computation: r =  0.6
4: inappropriate distance matrices are omitted in computation: r =  0.8
5: inappropriate distance matrices are omitted in computation: r =  0.8
6: inappropriate distance matrices are omitted in computation: r =  1
7: inappropriate distance matrices are omitted in computation: r =  1
8: inappropriate distance matrices are omitted in computation: r =  1.2
9: inappropriate distance matrices are omitted in computation: r =  1.2
10: inappropriate distance matrices are omitted in computation: r =  1.4
11: In lsfit(X, zz, 1/vv, intercept = FALSE) : 'X' matrix was collinear

I don't understand the error. Please help!

Thanks!!

clustering r • 9.3k views

ADD COMMENT • link updated 5.3 years ago by dmitry.pesko • 0 • written 10.1 years ago by Diana ▴ 910

1

Entering edit mode

Dear Diana,

Finally did you find the solution for these errors/warnings?

please let me know

ADD REPLY • link 8.3 years ago by panconchoclo ▴ 10

0

Entering edit mode

Diana, use r = 1.0 could solve the problem (ordinary bootstrap instead of multiscale bootstrap option).

Did you try with a smaller nboot value first? How many memory do you have?

ADD REPLY • link 8.1 years ago by panconchoclo ▴ 10

score 2 · Answer 1 · 2014-03-18

Hi,

Have you tried looking at how the result looked like by just typing result? Just tried the same method with the data that was given above (seems like the actual data is much bigger than the one that you've put it in the previous post). This is what I get when I type in result:

> result

Cluster method: average
Distance      : correlation

Estimates on edges:

 au    bp se.au se.bp      v     c  pchi
1 1.000 0.594 0.000 0.006 -1.852 1.614 0.006
2 0.998 0.286 0.001 0.005 -1.138 1.702 0.243
3 0.991 0.347 0.002 0.005 -0.978 1.371 0.000
4 0.989 0.337 0.003 0.005 -0.941 1.363 0.009
5 0.635 0.305 0.027 0.005  0.082 0.428 0.000
6 0.909 0.415 0.012 0.005 -0.558 0.774 0.042
7 1.000 0.999 0.000 0.000 -4.069 1.084 0.910

In terms of the error, perhaps write the people who wrote pvclust (contact information should be available in the webpage: http://www.is.titech.ac.jp/~shimo/prog/pvclust/). By reading the message it seems like pvclust have discarded the all the distance matrices calculated using the r value, and there is nothing left to use for subsequent calculations.

Another thing to do try would be treat the rows as genes and columns as samples (i.e. t(data) ). That should give you a transposed data matrix as you may already know. Also, check the formatting of your table and make sure there isn't anything like spaces between numbers and other minor oddities.