Question: How to cluster drugs by up/down gene lists?
gravatar for Antony
18 months ago by
Antony0 wrote:

I'm hoping for some help on drug clustering by gene list. The data I have has not come through any particular workflow, it's all held in text files.

I have a 100 drugs and per drug I have a list up/down genes. Whether a gene is up/down regulated is purely by its presence in a text file (as two columns, "up" and "down"). There is no numerical expression data at all.

I have very little clustering experience. I was hoping to learn whether it would be possible cluster among the 100 drugs by their up/down gene lists similarity. As for a clustering cut-off, I'm not entirely sure and I am happy for that to be exploratory for the time being providing it finds some separation.

Anything available in R would be very helpful.


clustering drug gene • 486 views
ADD COMMENTlink modified 18 months ago by Jean-Karim Heriche23k • written 18 months ago by Antony0

It may be worth exploring the literature for some practical examples, if you have not done so already. Some random examples are below:

On the drug front, have a look at the Open Targets Platform and its batch search. It may useful to try it out for your list of genes and find which diseases are associated with your genes, any pathway (and GO) enrichment set in that list, and an overview of possible protein interactions among those.

ADD REPLYlink modified 18 months ago • written 18 months ago by Denise CS5.1k

Thanks for all of the information. I work down to the wire, but when I've looked into this wealth of information I'll reply.

ADD REPLYlink written 18 months ago by Antony0

Another idea may be to just perform a simple gene enrichment and then plot them, as I do here: Clustering of DAVID gene enrichment results from gene expression studies

Not 100% what you want, though.

ADD REPLYlink written 18 months ago by Kevin Blighe65k
gravatar for Jean-Karim Heriche
18 months ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche23k wrote:

Represent each drug by a binary vector of genes where 1 is up and 0 is down then use a measure of similarity appropriate for binary vectors (for a selection, check the R package proxy). Start with hierarchical clustering with complete linkage to get an idea of the structure of the data. If there is any strong clustering structure, you should see it there.

ADD COMMENTlink written 18 months ago by Jean-Karim Heriche23k

Thanks, Jean-Karim. I'll give it a try. Just an added thought, I'm struggling to visualise how this is going to work if the genes expressed between drugs are different for example, just looking at up genes, how would two vectors be comparable if in the same element position they had a "1" but for two completely different genes? What I do not have is the complete space of all genes, just those that are up/down regulated in each drug.

ADD REPLYlink modified 18 months ago • written 18 months ago by Antony0

You could either decide to only use the genes that are common to all the drugs or treat them as missing values or a combination of both. For example, genes that are missing in a large fraction of the drugs (e.g. 60-70%) could be dropped and missing information could be treated as a third category (e.g. 1: up, -1, down and 0: missing, this is a form of imputation). In the later case, the data is not binary anymore and you would need to look for suitable measure of similarity/distance.

ADD REPLYlink written 18 months ago by Jean-Karim Heriche23k

I now have the complete gene list and I am taking the approach of +1 (up), 0 (missing), -1 (down). When I find a means of calculating a similarity/distance, I'll edit my post. Thanks.

ADD REPLYlink written 18 months ago by Antony0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 873 users visited in the last hour