Question: Co-expression network using a dataset from GEO
1
gravatar for omer.k
2.6 years ago by
omer.k20
omer.k20 wrote:

Hi community, I have a final assignment from an introductory bioinformatics course. My overall idea would be to use an already existing dataset of gene expression from GEO, and use it to construct a gene expression networks. Here is the data set I'd like to work with: https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS5232 In this research, the expression of genes was measured in "young", as well as "old" patients, diagnosed with colorectal cancer. Therefor, the aim of my project would be to identify co-expressed genes in the young population and compare this interactions to the old population and vice versa. I'm hope to visually demonstrate, through the network, the change in linkage parameters in specific genes (closeness, betweeness, degree).

I considered moving in three general steps: 1. Process the table: remove null values, average values of the same gene measured by different oligos. Then normalize the values (mean 0, std 1). 2. Produce for the two populations the respectful Pearson correlation matrices (have a look at the demo table I uploaded). from this table I'll, by setting a cut-off (i.e. abs(0.75)), I'll extract just the genes of the the highest correlation. 3. Produce another table/file which is manageable in CytoScape to show the interactions I referred to earlier.

I already have the 1st and second steps (used MATLAB, which is all I know. I'd be happy to share the code, though I'm not graded by it's efficiency etc)

What do you think of the workflow, will it work?

I really need help moving with the data from the second step to CytoScape. If my suggestion of how to use the data is not realistic, please suggest an alternative way of work.

Thanks a bunch.

rna-seq cytoscape gene • 1.2k views
ADD COMMENTlink modified 2.6 years ago by Lars Juhl Jensen11k • written 2.6 years ago by omer.k20
2
gravatar for Lars Juhl Jensen
2.6 years ago by
Copenhagen, Denmark
Lars Juhl Jensen11k wrote:

Whether it will work really depends on what you mean by "work".

1) Can it be done? Yes. You can calculate Pearson correlation coefficients between genes for each of two subsets of samples. You can apply an arbitrary cutoff to them and obtain two networks. You can calculate various network parameters for the nodes in the two networks and compare them. You can load everything into Cytoscape and nodes/edges or even create an animation that morphs the young network into the old if you want. There is no doubt that what you propose can be done.

2) Would it work as a final assignment? Probably. It obviously depends on the course and what the professor wants to evaluate. It shows that you are able to actually perform some hands-on analyses. It gives you opportunity to show that you understand what a co-expression network is, what the network parameters mean, and how to interpret them. It will also allow you to show that you can critically appraise your results. As a final assignment, it would thus work, assuming that it is within scope of the course.

3) Would it work from a scientific standpoint? No. I honestly do not think this will be able to give useful insights into development of colorectal cancer in young and old. There are numerous reasons for this: the number of samples in each category is quite low for making a co-expression network, these networks are quite messy even if based on many samples, applying an arbitrary cutoff to make co-expression binary is problematic, using the same cutoff for two networks based on different numbers of samples is problematic, differences between the two networks will likely be dominated by "noise", and many of the network metrics may not be meaningful for this type of network. I would thus be highly sceptical of any results coming out of it.

Point #3 is obviously deeply problematic if the goal was to make a scientific paper. However, it is also very much what allows you to show that you can think critically, so from an assignment perspective it is not all bad.

If I were to make a biological network analysis of the dataset in question, my approach would be very different. Briefly, I would analyze the expression data to identify differences at the level of individual genes between classes of samples. I would map this onto an external network (i.e. the edges would not be derived from the expression data) and identify clusters or modules within the network where you see interacting genes showing a similar behavior. Precisely because the edge information was external, it can serve as independent evidence and thus allow me to find groups of genes that are likely to be more relevant in the disease context that was was found in the initial gene-level analysis. I would then use text mining to help identify literature relevant to each of the identified module and based on that put each module in the context of what is know. That, however, is probably way too much work for an assignment and almost certainly involves methodologies outside the scope of the course.

ADD COMMENTlink written 2.6 years ago by Lars Juhl Jensen11k

Thanks for the detailed response!

I'd like to clarify that I don't have any aspirations to develop this to a paper. Just a final assignment. My suggestion had already been approved.

I'd have to carefully read again your final paragraph, as it may point me in the right direction (perhaps I'm off-course now)..

Also, what I've described in my post was not the whole scope of the work. After producing the network, my colleagues and I will move on to focus on selected genes by various tools (TargetScanHuman, MEME motif finding, etc). Here's a link to the "Abstract" of our plan, including the supervisor comments

https://www.dropbox.com/s/y4yn23c3ktk5sbb/Gene%20Expression%20Project%20-%20036867190%20015938046%20034649491%20%28feedback%29.docx?dl=0

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by omer.k20

Sorry for not having answered sooner. If your plan is to look for shared transcription factor motifs etc. co-expression makes good sense. However, for that purpose I would not convert it to a binary network; you would in my opinion be much better off not applying a cutoff and thus have a weighted network. Then apply clustering to that, e.g. MCL, to identify co-expressed modules and look for shared motifs within each of those.

ADD REPLYlink written 2.5 years ago by Lars Juhl Jensen11k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2450 users visited in the last hour