Question: Performing Pathway Analysis On Cnv Data
gravatar for Robert Sicko
6.6 years ago by
Robert Sicko610
United States
Robert Sicko610 wrote:

I have groups of samples with copy-number variation (CNV) calls made based on microarray data. I am trying to determine if specific pathways are enriched with CNV for particular phenotypes. I've looked at How To Test Whether Copy Number Aberrations Are Enriched In A Gene List and other posts that describe pathway analysis from expression data. I currently have my data formatted for importing into PathVisio (tab-delimited file with genes as rows, columns are log transformed fold-change for each gene in each sample. If a gene was not overlapped by a CNV in a subject I assumed normal expression).

I have a few normal controls run with each batch, and each batch is a different phenotype. I'm trying to figure out the best way to determine if a pathway is enriched; should I compare pathway-X in sample1 to pathway-Y in sample1, should I compare pathway-X in phenotype1(all samples for a particular pathway averaged? summed?) to pathway-X in phenotype2, or should I do similar to the link above and generate random groups of genes of the same size as pathway-X and compare pathway-X in sample1 to randomly-generated-group-of-genes in sample1?

Statistics is not one of my strengths so any input is greatly appreciated.

enrichment copynumber cnv • 4.0k views
ADD COMMENTlink modified 6.6 years ago by B. Arman Aksoy1.2k • written 6.6 years ago by Robert Sicko610
gravatar for B. Arman Aksoy
6.6 years ago by
B. Arman Aksoy1.2k
New York, NY
B. Arman Aksoy1.2k wrote:

If I understand your question correctly then I think the first thing you should do is to decide what a pathway alteration means -- and what you will do when two genes have conflicting events (a homozygous deletion on one and amplification on another). I am saying this because people have different ways of defining an alteration in pathway. I saw people do this for expression data and they simply define a "pathway activity score" by averaging over all gene expression values for each sample. You can go with a similar approach for CNV data, but you should be aware that this will not be the same as gene expression -- and hence will be really noisy. People also convert these data into a binary matrix and simply define thresholds to call CNA event as altered vs non-altered. And they then use frequency of altered samples for each of their sample groups.

I think you can instead try to do an unbiased hierarchical clustering on your gene-level data (where you remove the non-altered genes to reduce the visualization complexity) and see if the cluster tend to capture your phenotype categories. If you want to apply this on a pathway level, then you can also collapse your data to pathways (group genes into pathways) and do a clustering with these pathways. I would first have this exploratory investigation on the data and then decide how you will decide on the features (either genes/pathways) that explains each of your phenotypes.

ADD COMMENTlink written 6.6 years ago by B. Arman Aksoy1.2k

Thanks for your reply. I think you are right, I should probably convert my data to a binary matrix instead of trying to force it to be expression data. Do you use an R package for unbiased hierarchical clustering?

ADD REPLYlink written 6.6 years ago by Robert Sicko610

I prefer R and use either heatmap, aheatmap or heatmap.2 to plot the data and label the rows/columns accordingly. If you don't feel comfortable with R, you can try GENE-E, which is GUI-based and helps with these types of operations:

Hope it helps,

ADD REPLYlink written 6.6 years ago by B. Arman Aksoy1.2k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 804 users visited in the last hour