Hi
I've a dataset of gene expressions of 102 patients and 9 healthy controls. I downloaded this dataset from GEO, I applied several preprocessing steps (normalization, batch correction based on date, etc), and I was finally able to generate a table containing:
the individuals on the rows
the genes on the columns
- each entry ij containing a real value that indicates the expression of the gene_i in the individual_j
This first preprocessing phase was a lot of effort. Now I would like to perform a differential gene expression analysis, to see how the genes expressions differ between the patients and the healthy controls.
I checked some packages online (such as DESeq2), and I noticed they all have specific requirements for input files, that need to contain raw counts. Unfortunately, I don't have raw counts.
I would like to perform a differential gene expression analysis by myself, by taking advantage of biostatistics R functions applied on my preprocessed tables.
How can I do it? Any suggestion?
Thanks!
Adding on this, what you have is array data which provides you with intensity values, not counts so a relative measure of gene expression rather than absolute counts as in RNA-seq.
limmaseems to be pretty much the standard and following their workflow should get you the intended results. Be sure to read the manual thoroughly and also look at this end-to-end workflow for Affymetrix microarrays.Thank you guys for your replies. There's a lot of material online and I feel like I am drowning in it. I found this interesting question and answer here on BioStars.org, that I tried to implement for my case. I used
lmFit(table)andeBayes(fit), as explained, withoutdesign.I was able to generate a table with the values of the fitted model for the patients, and a table for the healthy controls. This is the head of the topTable of the patients fit:
head(topTable(fit, n=Inf, sort="p", p.value=0.05))Some questions:
1) What is the meaning of these p-values associated to each gene that I found this way?
2) Was it a good/useful idea to split the patients and healthy controls into two different tables and perform the analysis separately? Or should I keep them together and insert this information into the
designparameter? If the latter, how?Thanks!