Hi,
I want to create a network out of gene expression data of a cancer cell line. The main goal is to analyze the topology of such a network that represents the cancer cell line. I used R to retrieve 3 samples from GEO for a specific cell line, then used the CollapseRows() method to collapse probe-IDs to a protein identifier, after averaging the 3 samples I picked the top 20 percent of the most expressed genes and used these to query iRefIndex. I then created a network containing these genes/proteins and the interactions between them collected from iRefIndex.
Is this sensible to do when trying to create a network representation of a specific cancer cell-line? Or am I approaching this in a wrong way?
Is sample n=3 sufficient to create a network, do you think?
No, and I was hoping to find more samples, and I have, but they are from different GeneChips (Affy HG-U133 Plus 2.0, Affy HG-U95, ...). Some more info: the cell lines I want to create networks for are the NCI-60 cancer cell lines. I know CellMiner has data on these, but coming from a coding background not knowing a lot of biology makes it hard to fully utilize CellMiner when I don't fully understand all the tools there. How many samples would be sufficient to create a network per cell line?
As many samples as you can find, I believe. Derermining 'power' in expression studies is difficult, even more so where network analysis is concerned. Just be aware that network analyses are mostly based on correlation, so, Parson, Spearman, or Kendal correlation values.