GSEA from broad institute normalization method
1
0
Entering edit mode
3.3 years ago
Rob ▴ 170

Hi freinds

I have HT-Seq raw count data of 19000 coding genes and 300 samples of two groups (treatment and control). I want to do gene set enrichment analysis with GSEA from broad institute. do I need to normalize my data before GSEA? How should I normalize for this purpose?

Thanks

RNA-Seq • 3.4k views
ADD COMMENT
5
Entering edit mode
3.3 years ago

Hi,

As input to the Broad Institute's GSEA program, you should use any type of expression data that is [properly] normalised such that cross-sample differences can be faithfully gauged. This can mean using any of these:

  • normalised RNA-seq counts via DESeq2's 'geometric' normalisation, EdgeR's TMM method, et cetera.
  • normalised + transformed RNA-seq expression levels, such as variance-stabilised (vst) or regularised log (rlog) expression levels from DESeq2, or log2 CPMs from EdgeR
  • normalised microarray data via RMA, GC-RMA, MAS5, neqc, et cetera

This does not mean raw counts or any of these types of expression levels: FPKM, RPKM, TPM, et cetera

More information here:

Kevin

ADD COMMENT
0
Entering edit mode

Hi I am trying to normalize my count data to do GSEA. these error come up: what is the solution?

data <- read.csv("myData.csv")
data <- matrix(data)
vst(data)

Error in vst(data) : less than 'nsub' rows,
  it is recommended to use varianceStabilizingTransformation directly


rlog(data)
Error in DESeqDataSet(se, design = design, ignoreRank) : 
  'list' object cannot be coerced to type 'double'
ADD REPLY
1
Entering edit mode

Assuming that your data that is held in data is raw counts, you should first normalise this via:

dds <- DESeqDataSetFromMatrix(
  countData = data,
  colData = coldata,
  design= ~ Group)
dds <- DESeq(dds)

data does not have to be a data matrix.

coldata should be a data frame that represents the metadata for data, and its rows should be perfectly aligned with the columns of data. You should have at leas one column, in this case 'group', that represents treatment and control

Then we can transform these normalised counts:

varStabilised <- vst(data, blind = FALSE)
regularisedLog <- rlog(data, blind = FALSE)

The normalsied + transformed expression levels will then be accessible via:

assay(varStabilised)
assay(regularisedLog)
ADD REPLY
0
Entering edit mode

Thanks Kevin What I understand from your answer is : first I do DESeq work flow with the raw count data, then the result (dds) as the normalized count will come to vst() or rlog():

varStabilised <- vst(dds, blind = FALSE)

Then this varStabilised will be my normalized data for GSEA. Am I getting this correctly?

ADD REPLY
1
Entering edit mode

Yep, but you can also use the normalised counts, accessible via:

counts(dds, normalized = TRUE)

The distributions of the normalised counts and that of the variance stabilised expression levels differ, but this is not a problem due to the fact that GSEA is based on ranking.

ADD REPLY
0
Entering edit mode

Thanks for explanation When I used this I get error:

counts(dds, normalized = TRUE)
Error in .local(object, ...) : 
  first calculate size factors, add normalizationFactors, or set normalized=FALSE

how can I solve this?

ADD REPLY
1
Entering edit mode

You have evidently not yet run DESeq(dds)

ADD REPLY
1
Entering edit mode

Thanks Kevin. it was helpful.

ADD REPLY

Login before adding your answer.

Traffic: 1850 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6