Question: Deal With High Dimensions And Low Number Of Examples Gene Expression Data
gravatar for mikewhity
7.7 years ago by
mikewhity0 wrote:

I have this general issue of high dimensionality and low number of examples gene expression data. Actually, I have some drug responses for some cancer cells and gene expression for those cells before the application of drugs. I want to relate the response to the gene expression, I mean explain the drug response from the gene expression. I only have around 18 examples and high dimensional gene expression of dimension 25000.

I tried with correlation analysis, see which genes are highly correlated with the drug response for each drug, select the highly correlated genes and used hypergeometric test to see if there are some pathways which is overrepresented in the genes/features for each drug.

However, I haven't got anything significant when running the pathway analysis. Any suggestions, how I should proceed.

pathway gene-expression • 1.9k views
ADD COMMENTlink modified 7.7 years ago by Sean Davis26k • written 7.7 years ago by mikewhity0

practically, enrichment analysis is vulnerable to the number of input genes(especially KEGG pathway enrichment), so, if you have small number of genes disturbed by the treatment, that would be unsupervised. if so you can pick out the disrupted genes according to the intensity fold changes among different treatment(ie case vs control), and take a further view on these most disrupted genes.

ADD REPLYlink written 7.7 years ago by ewre220
gravatar for Sean Davis
7.7 years ago by
Sean Davis26k
National Institutes of Health, Bethesda, MD
Sean Davis26k wrote:

I'd suggest performing a hypothesis test. In particular, you may want to perform a regression drug response vs gene expression; the limma Bioconductor package can do this. Also, depending on your drug response assay, you may be interested in stratifying into "responders" and "non-responders" and use limma to test for differences in gene expression between the two groups of samples.

Contrary to popular belief (that you may or may not hold), finding nothing significant or understandable after running pathway analysis is not uncommon, so I would not use that as a the single measure of success or failure of your methods.

ADD COMMENTlink modified 7.7 years ago • written 7.7 years ago by Sean Davis26k

I have tried the drug response evaluation. Based upon whether the drug kills the cells or not. I have separated them into two classes. The cells which are killed by the drug and cells which arent't. Then using hypothesis test like ttest, kruskal wallis test, I selected those genes which could be separated into the two classes. But didn't get something significant there as well. Again, I selected the genes that have distinction between the two classes and did hypergeometric test so see any significant pathways. I am not sure what limma will do and how it will help. I tried to install it before. But due to library issues, I couldn't install it so I couldn't try. Can you let me know what limma can do?

I am not sure what regression will help me for. I don't want a model which predicts the drug response based upon gene expression. I tried to create a linear regression model with lasso regularization. But that doesn't give me anything significant. I don't want a predictor. Any suggesions?

ADD REPLYlink written 7.7 years ago by mikewhity0

Limma has an extensive user guide. To install, first install R and then follow the instructions on the bioconductor website. If you have problems installing, feel free to write to the bioconductor email list with details.

As for significant genes, there is no guarantee that there will be any.

Finally, consider finding a local collaborator who has worked with gene expression data before; you can certainly spend a lot of time trying to reinvent wheels and troubleshoot issues that are not really problems.

ADD REPLYlink written 7.7 years ago by Sean Davis26k

@Sean Davis. Currently, I don't have any local collaborators. I wanted to know since I have very few samples, around 18 cells. Will that be enough to get significant results. I wanted to know if there are any other references where using such small samples anything significant has been done. Can you give me some resources that would help?

ADD REPLYlink written 7.7 years ago by mikewhity0

You are saying that you have 18 samples (18 arrays)? As for "is that enough", I cannot answer that since I do not know how large an effect the drug has on gene expression, but I would not consider 18 samples to be extremely small. In particular, 18 samples does not mandate taking any special approaches to analysis besides using microarray-specific tools.

ADD REPLYlink written 7.7 years ago by Sean Davis26k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 914 users visited in the last hour