RNA Seq data with GSEA
2
0
Entering edit mode
4.2 years ago
pennakiza ▴ 60

Hi all,

I am a bit confused with how I should be transforming my data to use it with GSEA. I have tried two different approaches and I am not sure which one is the best.

First, I have created a DGElist, filtered my low count reads and transformed it with voom. These data I fed then to GSEA.

y <- DGEList(counts = fc_mydata$counts, genes=fc_mydata$annotation[,c("GeneID","Length")])
keep<-filterByExpr(y)
y<-y[keep,]
y<-calcNormFactors(y, method="TMM")
y_voom<-voom(y) 
GSEA_table<-y_voom$E
colnames(GSEA_table)<-pastemydata$Treatment[match(colnames(GSEA_table),mydata$Sample)],mydata$DaysToRebound,mydata$SampleID,sep="_")
 GSEA_table<-GSEA_table[,order(colnames(GSEA_table))]
 symb<-annotation$Symbol[match(rownames(GSEA_table),annotation$GeneID)]
 write.table(file="GSEA_table_final.txt",cbind("NAME"=symb,GSEA_table),row.names = F,quote = F,sep = "\t")

Second approach, I ran DESeq2 on raw counts, transformed it with vsn and fed that to GSEA.

dds<-DESeqDataSetFromMatrix(countData=fc_mydata$counts, colData=mydata, ~Treatment)
dds<-DESeq(dds)
res<-results(dds)
res_ordered<-res[order(res$padj),]
res_ordered<- res_ordered[order(res_ordered$padj, decreasing = F),]
rownames(res_ordered)<-make.names(annotation$Symbol[match(rownames(res_ordered), annotation$GeneID)], unique=TRUE)
head(res_ordered)
write.table(res_ordered, "res_ordered",sep="\t")

vst<-vst(dds, blind=F)
normalised_vst<-assay(vst)
rownames(normalised_vst)<-make.names(annotation$Symbol[match(rownames(normalised_vst), annotation$GeneID)], unique=TRUE)
normalised_vst<-normalised_vst[,order(colnames(normalised_vst))]
write.table(normalised_vst, "normalised_vst.txt", cbind("NAME" = rownames(normalised_vst), normalised_vst),row.names = F,quote = F,sep = "\t")

What do you think, which is the most correct one? What can I do to improve?

Thanks a lot!

GSEA RNA-Seq voom vsn • 3.6k views
ADD COMMENT
2
Entering edit mode
4.2 years ago
Gordon Smyth ★ 7.0k

I am the voom author, but I do not recommend voom for this purpose because GSEA is unable to use the voom precision weights. Just use:

y <- DGEList(counts ...)
y <- calcNormFactors(y)
GSEA_table <- cpm(y, log=TRUE)

You do not need filterByExpr here but, if you do use it, it should be used in conjunction with your treatment factor or design matrix.

As an alternative to GSEA, you might also consider the pathway analysis functions such as camera that come as part of limma.

ADD COMMENT
0
Entering edit mode

Thank you Gordon! I will try camera too!

ADD REPLY
0
Entering edit mode
4.2 years ago
dsull ★ 5.8k

Look at the instructions here:

https://software.broadinstitute.org/cancer/software/gsea/wiki/index.php/Using_RNA-seq_Datasets_with_GSEA

Using DESeq2 is fine (and other types of acceptable RNA-seq normalizations should also be fine).

You also may not need to vst-transform your normalized counts, per the following FAQ: https://software.broadinstitute.org/cancer/software/gsea/wiki/index.php/FAQ#Should_I_use_natural_or_log_scale_data_for_GSEA.3F

DESeq2 also gives you the log fold changes (treatment compared to control) of each gene. You can just use the log fold changes to rank your gene list and then feed that ranked gene list into GSEAPreranked algorithm. This is what I generally do. If you do this, I'd recommend using lfcShrink to get more accurate log fold change values.

ADD COMMENT
0
Entering edit mode

Thank you very much!

I thought that you needed to transform the data before but I will certainly try it without it. I have tried to used preranked but in some cases I have only a few DE genes (p-adj <0.05) and if I only use those I don't get any gene sets enriched.

ADD REPLY
0
Entering edit mode

Ah, for the prerank algorithm, you still use ALL your genes, not just your DE ones. So if 200 out of 40000 genes are DE, you still use all 40000 genes. Basically, you just supply a list of ALL genes ranked by their log fold change.

That's the power of GSEA -- you use every single gene and it's not based off of cutoffs/thresholds. Contrast this to gene ontology overrepresentation analysis where you only input a list of genes that are considered DE.

ADD REPLY
0
Entering edit mode

That's cool, in this case I am getting the same results, which is reassuring!

Thank you very much for your help!

ADD REPLY

Login before adding your answer.

Traffic: 2708 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6