Hi freinds
I have HT-Seq raw count data of 19000 coding genes and 300 samples of two groups (treatment and control). I want to do gene set enrichment analysis with GSEA from broad institute. do I need to normalize my data before GSEA? How should I normalize for this purpose?
Thanks
Hi I am trying to normalize my count data to do GSEA. these error come up: what is the solution?
Assuming that your data that is held in
data
is raw counts, you should first normalise this via:data
does not have to be a data matrix.coldata
should be a data frame that represents the metadata fordata
, and its rows should be perfectly aligned with the columns of data. You should have at leas one column, in this case 'group', that represents treatment and controlThen we can transform these normalised counts:
The normalsied + transformed expression levels will then be accessible via:
Thanks Kevin What I understand from your answer is : first I do DESeq work flow with the raw count data, then the result (
dds
) as the normalized count will come tovst()
orrlog()
:Then this
varStabilised
will be my normalized data for GSEA. Am I getting this correctly?Yep, but you can also use the normalised counts, accessible via:
The distributions of the normalised counts and that of the variance stabilised expression levels differ, but this is not a problem due to the fact that GSEA is based on ranking.
Thanks for explanation When I used this I get error:
how can I solve this?
You have evidently not yet run
DESeq(dds)
Thanks Kevin. it was helpful.