Question

gct output in DESeq2

1

Entering edit mode

2.8 years ago

jabbari.parnian ▴ 30

Hi everyone,

I'm trying to analyze my counts data with DESeq2 and based on the tutorial of GSEA, DESeq2 has an output format that can be used directly in the GSEA (here). However, I'm reading their workflow and I don't find how to make this gct output format. Any help is appreciated.

GSEA gct DESeq2 • 4.1k views

ADD COMMENT • link updated 21 months ago by arsenal's • 0 • written 2.8 years ago by jabbari.parnian ▴ 30

0

Entering edit mode

Hello everyone, i am facing a problem. I finished running DESeq2 and i wanted to generate a gct file for GSEA. However , when i check my gct file, i am not getting my gene ids column . I have followed what was given here. The first solution , then the second solution separately and now together, still facing the same problem. Please let me know on how i can do that and also on how i can generate a cls file for my data. Here is my code :-

library(DESeq2)
library(ggplot2)
countData <- read.csv('/home/keshav/Downloads/gene_count_matrix.csv', header = TRUE,sep = ",")
rownames(countData) <- countData[ , 1]
countData = as.matrix(countData[ , -1])
head(countData)
(condition <- factor(c("Normal","Tumor","Normal","Tumor")))
(coldata <- data.frame(row.names=colnames(countData), condition))
dds <- DESeqDataSetFromMatrix(countData=countData, 
                              colData=coldata, 
                              design=~condition)
dds
dds <- DESeq(dds)
res <- results(dds)
head(results(dds, tidy=TRUE))
summary(res)
norm_counts <- counts(dds, normalized = T)
norm_counts <- as.data.frame(norm_counts)
norm_counts$description <- norm_counts$gene_id
fid <- "norm_counts.gct" 
writeLines(c("#1.2", paste(nrow(norm_counts), ncol(norm_counts) - 2, collapse="\t")), fid, sep="\n")
write.table(norm_counts, file=fid, quote=FALSE, row.names=TRUE, col.names=TRUE, sep="\t", append = TRUE)

ADD REPLY • link 21 months ago by arsenal's • 0

3

Entering edit mode

2.6 years ago

jv ★ 1.8k

I would like to offer an alternative answer. Neither the cls nor the gct file need to be created by hand nor is manual editing in a plain text editor required. Both files can be generated from code in R. For example, the gct file can be written as follows (which assumes the first two columns of norm_counts are the gene name and description):

fid <- "norm_counts.gct" 
writeLines(c("#1.2", paste(nrow(norm_counts), ncol(norm_counts) - 2, collapse="\t")), fid, sep="\n")
write.table(norm_counts, file=fid, quote=FALSE, row.names=FALSE, col.names=TRUE, sep="\t", append = TRUE)

ADD COMMENT • link 2.6 years ago by jv ★ 1.8k

0

Entering edit mode

Very helpful alternative when you have lots of files. Thank you!

ADD REPLY • link 2.1 years ago by a_confused_biologist • 0

score 5 · Accepted Answer · 2021-06-15

Hi!

In this case you have to prepare the gct file "by hand" for analyze your data with GSEA. Once you have created your DESeq object (dds) you are going to retrieve the normalized counts based on the DESeq method:

norm_counts <- counts(dds, normalized = T)

Then, you must arrange this normalized counts matrix according to the minimum requirements for a GCT file. I suggest you to codificate the norm_counts as a data.frame and add an extra column called description (this column could contain the id's of your genes):

norm_counts <- as.data.frame(norm_counts)
norm_counts$description <- norm_counts$your_id_column

Move the "description" column to be the second in the df and save it using the write.table command:

write.table(norm_counts, "path_to_save_file.gct", sep = "\t", quote = F, row.names = F)

Finally, using a plain text editor put all the additional information (following the directions of the link) to have a gct file for GSEA. For a detailed tutorial about gct and cls files watch this video but please follow the directions of the link

Hope it could help!

Best regards!

Rodo