Deseq2 Differentially Expressed Genes between conditions
2
0
Entering edit mode
3.8 years ago

Hello friends!

I have been working with my RNAseq output after running it thru the Deseq package in R. I have a list "genes.txt" of the genes in my 9 samples, the pvals, log2 fold change etc. I would like to compare lets say: Control vs. Knockdown and then use that input to use for GSEA.

Here is my phenotype file:

name    type
sample1 Control
sample2 KD
sample3 OE
sample4 Control
sample5 KD
sample6 OE
sample7 Control
sample8 KD
sample9 OE

Samples are grouped in 3s. So 1,2,3 are from the same cell line and so on for the rest.

This is the current code I am working with that provides the "global" values of all these samples, and Id like the output to take into account which values belong with which sample/condition(type). Is there a simplistic way to do this ? Id like to end up comparing the 3 controls vs the 3 KDs in GSEA as the next step.

Here is the code I currently have:

source("http://bioconductor.org/biocLite.R")
biocLite("ballgown")
biocLite
library("DESeq2")

countData <- as.matrix(read.csv("transcript_count_matrix.csv", row.names = "transcript_id"))
colData <- read.csv("pheno_data", sep="\t", row.names = 1)

all(rownames(colData) %in% colnames(countData))

countData<-countData[, rownames(colData)]
all(rownames(colData) == colnames(countData))

#create deseq DataSet from count matrix and labels
dds <- DESeqDataSetFromMatrix(countData = countData, colData = colData, design = ~ type)

#Run the default analysis for DESeq2 and generate results table

dds <- DESeq(dds)
res <- results(dds)

#Sort by adjusted p-value and display 

(resOrdered <- res[order(res$padj), ])

write.table(resOrdered, file = "genes.txt", sep ="\t")

#SORT BY CONDITION TYPE



# READ INTO GSEA SECTION

Any guidance on this I would greatly appreciate!

Thanks so much,

Bryce

RNA-Seq Deseq2 Differential Expression R GSEA • 2.1k views
ADD COMMENT
2
Entering edit mode
3.8 years ago
Buffo ★ 1.9k

You need to perform DESeq2 contrasts to compare samples, then filter results by pvalue, padjusted, etc and finally to GSEA.

ADD COMMENT
0
Entering edit mode

Hi thanks for the information! I believe I have made good progress using your help. I have the results from my run using that code, but I'd still like to add which values correspond to which sample. So basically, what would be the best method to add another column that categorizes which sample goes with which value? Also I'm not sure why its comparing OE3 with Control1 as the default? Any suggestions figuring this out I would really appreciate! Thank you!

> (resOrdered <- res[order(res$padj), ])
log2 fold change (MLE): group OE3 vs Control1 
Wald test p-value: group OE3 vs Control1 
DataFrame with 229551 rows and 6 columns
                 baseMean log2FoldChange     lfcSE         stat       pvalue       padj
                <numeric>      <numeric> <numeric>    <numeric>    <numeric>  <numeric>
ENST00000462898  801.1637     -13.050030  3.351083    -3.894273 9.849388e-05 0.09158196
ENST00000397492 1374.4390       8.500578  2.211868     3.843166 1.214570e-04 0.09158196
ENST00000482918  655.4429     -12.720482  3.310872    -3.842033 1.220194e-04 0.09158196
MSTRG.2890.8     691.0694     -12.466736  3.322664    -3.752030 1.754083e-04 0.09158196
ENST00000543146  877.2129     -12.163227  3.207941    -3.791599 1.496802e-04 0.09158196


#create deseq DataSet from count matrix and labels
dds <- DESeqDataSetFromMatrix(countData = countData, colData = colData, design = ~  type)

#----- Comparisons #Run the default analysis for DESeq2 and generate results table
dds$group <- factor(paste0(dds$type))
design(dds) <- ~ group

dds <- DESeq(dds)
resultsNames(dds)

#______



#dds <- DESeq(dds)
res <- results(dds)

#Sort by adjusted p-value and display 

(resOrdered <- res[order(res$padj), ])

write.table(resOrdered, file = "genes.txt", sep ="\t")

ADD REPLY
0
Entering edit mode

I usually do like below.

> conds <- c("Control","KD","OE","Control","KD","OE","Control","KD","OE")

> coldat=DataFrame(conds=factor(conds))

> dds <- DESeqDataSetFromMatrix(countData = countData, colData = colData, design = ~  type)


> res <- results(dds, contrast = c("conds","OE","Control"))

> res_sig <- subset(res, pvalue<=.05 & abs(log2FoldChange)>=1)
ADD REPLY
1
Entering edit mode
3.8 years ago
mbk0asis ▴ 630

To generate a nomalized count table, run

cnts_Norm <-  counts(dds, normalized = T).

Then, use the table for GSEA. And you may want to look for 'gage', a R package for GSEA.

ADD COMMENT

Login before adding your answer.

Traffic: 1983 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6