Question: How to do unsupervised clustering using copy number variation data?
2
gravatar for dr.chenway
2.8 years ago by
dr.chenway20
dr.chenway20 wrote:

Hi, ALL, I want to do unsupervised clustering using segmented copy number variation data (like those derived from SNP array), and then visualize it. The results will look like the following figure (Figure 1A). Samples are clustered based on their CNV.

Clustering of copy number (Figure 1A)

I know how to draw a heatmap with clustering using data in matrix in R software. However, the data structure of the segmented copy number is quite different. I only know IGV tools can visualize this kind of data. But IGV doesn't provide options to do the clustering. Can anybody give me some instructions to do this? Any help will be greatly appreciated.

igv clustering cnv snp R • 2.4k views
ADD COMMENTlink modified 2.2 years ago by manali.rupji0 • written 2.8 years ago by dr.chenway20

Isn't that described in the method section of the paper (if you gave the link to the paper, we could read it) ? The key is to get a vector representation of the samples that captures the relevant information. From the figure, each sample appears to be represented by a vector in which each element corresponds to a section of chromosome and the values are copy gain/loss of each chromosomal section.

ADD REPLYlink written 2.8 years ago by Jean-Karim Heriche18k

Thanks for your answer. This is the original paper Comprehensive Molecular Characterization of Papillary Renal-Cell Carcinoma. The authors did mention how they performed the analysis in the supplement data (page 11 of supplementary material). However, it was very simple and did not describe clearly how to do the clustering using copy number data. Thanks again.

ADD REPLYlink written 2.7 years ago by dr.chenway20
1

As I read it, they represented each tumor with a vector of regions identified by the GISTIC2.0 software as having copy number variations and each value in the vector is the log2 of the copy number of the corresponding region. Then they did clustering with:

d<-dist(data,method="euclidean")
tree<-hclust(d,method="ward.D2")
ADD REPLYlink written 2.7 years ago by Jean-Karim Heriche18k

Could you please elaborate a little on "The key is to get a vector representation of the samples" and "they represented each tumor with a vector"? Thanks.

ADD REPLYlink written 20 months ago by apuhegde20
  • Vector representation of the samples: each sample is represented by a series of numbers, each of which is considered to describe or capture some feature/property of the samples. This set of numbers is called a feature vector in machine learning and related fields. Note that for data mining purposes, all samples have to be described using the same set of features/properties.
  • They represented each tumor with a vector: In the case discussed here, each sample is represented by the number of copies it has of specific genomic regions.
ADD REPLYlink written 20 months ago by Jean-Karim Heriche18k

I wish to perform a clustering analysis on the long-insert whole genome sequencing assay CNV data based on the Multiple Myeloma database. As a part of their download, I have only the .seg file made available. I believe the GISTIC2.0 software requires a markers.file.

1) is GISTIC2.0 tool appropriate to use for whole genome sequencing assay CNV analysis? if not, what tools could I use? 2) How to account for the samples that do not have a copy gain, copy loss or is copy neutral?

ADD REPLYlink written 2.2 years ago by manali.rupji0

I wish to perform a clustering analysis on the long-insert whole genome sequencing assay CNV data based on the Multiple Myeloma database. As a part of their download, I have only the .seg file made available. I believe the GISTIC2.0 software requires a markers.file.

1) is GISTIC2.0 tool appropriate to use for whole genome sequencing assay CNV analysis? if not, what tools could I use? 2) How to account for the samples that do not have a copy gain, copy loss or is copy neutral?

ADD REPLYlink written 2.2 years ago by manali.rupji0

Please post this as a new question. Then come back and delete this post.

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by genomax63k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2242 users visited in the last hour