Gene clustering from signaling pathway
1
0
Entering edit mode
8.2 years ago
kh.amine1 ▴ 10

Hello,

I'm a new member in the forum.

I just start learning about gene clustering. I have a set of genes from a determined pathway and I want to resample and cluster those genes start using hierarchical clustering in GEO ncbi.

Then, I would like to compare this cluster with expression data from experiments.

I have collected my list of genes from KEGG database.

Please, can you help me knowing what I need to start clustering?

Thank you

gene genome • 2.3k views
ADD COMMENT
0
Entering edit mode
8.2 years ago
Benn 8.3k

Hi,

Usually people cluster genes based on their expression values. So what you need is a matrix with in each row a gene, and in every column a sample. So you'll have to select which samples from GEO you want to include in your experiment.

I would suggest using R for clustering, but that's my personal opinion.

Good luck!

ADD COMMENT
0
Entering edit mode

For bigger matrices, Cluster3.0

ADD REPLY
0
Entering edit mode

Thank you, I appreciate your help!

As I'm a beginner in clusters I may ask some obvious question, but will help me progress.

Actually, the sample I should put in the column is my question.

For example, I want to see the correlation between those genes in the natural pathway and in the breast cancer. What kind of information should I define in my column???

Could I upload my list of genes from the KEGG database in like a file readable using clustering program??

Many thanks

ADD REPLY
0
Entering edit mode

I am not exactly sure what you mean by "the correlation between those genes in the natural pathway and in the breast cancer". Do you mean the correlation between normal and cancer samples, for each gene of your KEGG list? You can calculate that in R, but that has not much to do with clustering.

Maybe you mean that you want a bunch of normal samples and cancer samples in one heatmap containing only your KEGG genes? If so you'll need to select your samples first from GEO, and get the expression values in a matrix. In R you can select only those genes of interest with e.g. the subset function.

ADD REPLY
0
Entering edit mode

How can I get the expression data for each gene from GEO? Can we Upload in an external matrix

Could you please give me like an example of a simple matrix with genes and related expression data? It is my first time I will use GEO.

There is another simple clustering program for testing?

I'm working on a Linux system.

ADD REPLY
0
Entering edit mode

Check the GEO tutorial on queries.

http://www.ncbi.nlm.nih.gov/geo/info/qqtutorial.html

ADD REPLY
0
Entering edit mode

Thank you for your help

Do you have any suggestion how to cluster on a heatmap my list of genes extracted from the pathway network and then visualize them?

The list of the genes is here.

My request is simple, I want to do a simple hierarchical clustering first in order to have like a general picture on how the genes will be grouped.

Thank you very much for your suggestions

ADD REPLY
0
Entering edit mode

If you have your matrix with expression values, import it in R. It would be wise to have the official gene symbol as row names. Save the gene symbol names of your pathway genes in a list and import it to R. Use the subset function to select the matrix with only the row names of that list. Then use heatmap.2 for clustering of that matrix.

Can you follow the steps?

ADD REPLY
0
Entering edit mode

Yes now its clear.

Just one more thing which still unobvious for me. Where can I find the expression data related to each gene? Can I download it from GEO?

Are the expression datas related to samples, mRNA or something else? Please if you can give more details about this point

I have Ubuntu linux disctribution, which version of the R package should I install?

Grateful!

ADD REPLY
0
Entering edit mode

Hi I don't know if you've noticed it, but your text is getting messed up by stuff about ginger software. I cant hardly read it.

I thought you were passed thru the GEO phase by now, did you read the tutorial that I sent in the link? GEO is a database of expression data. It does not automatically select a nice data set for you, you'll have to do that yourself ;-)

ADD REPLY
0
Entering edit mode

Formatting problem, just copy paste the text in a text editor and paste back or remove formatting manually.

ADD REPLY

Login before adding your answer.

Traffic: 1790 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6