Hi community, I have a final assignment from an introductory bioinformatics course. My overall idea would be to use an already existing dataset of gene expression from GEO, and use it to construct a gene expression networks. Here is the data set I'd like to work with: https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS5232 In this research, the expression of genes was measured in "young", as well as "old" patients, diagnosed with colorectal cancer. Therefor, the aim of my project would be to identify co-expressed genes in the young population and compare this interactions to the old population and vice versa. I'm hope to visually demonstrate, through the network, the change in linkage parameters in specific genes (closeness, betweeness, degree).
I considered moving in three general steps: 1. Process the table: remove null values, average values of the same gene measured by different oligos. Then normalize the values (mean 0, std 1). 2. Produce for the two populations the respectful Pearson correlation matrices (have a look at the demo table I uploaded). from this table I'll, by setting a cut-off (i.e. abs(0.75)), I'll extract just the genes of the the highest correlation. 3. Produce another table/file which is manageable in CytoScape to show the interactions I referred to earlier.
I already have the 1st and second steps (used MATLAB, which is all I know. I'd be happy to share the code, though I'm not graded by it's efficiency etc)
What do you think of the workflow, will it work?
I really need help moving with the data from the second step to CytoScape. If my suggestion of how to use the data is not realistic, please suggest an alternative way of work.
Thanks a bunch.