DESeq2 on GEO dataset
1
0
Entering edit mode
4.0 years ago

Hi, I am looking to perform Differential Gene expression analysis using R and GEO datasets. But I couldn't find the count matrices in the datasets, it is a raw data and need preprocessing. Can anyone help me how to use DESeq2 libary to preprocess the GEO dataset for Differential gene expression analysis.

rna-seq GEO R deseq2 preprocessing • 4.1k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Which dataset?

ADD REPLY
0
Entering edit mode

GSE3821_series_matrix

ADD REPLY
2
Entering edit mode
4.0 years ago

Okay, that is a microarray, so, you cannot use DESeq2. You should use limma.

To retrieve the already-normalised data, you can simply use:

library(Biobase)
library(GEOquery)
gset <- getGEO("GSE3821", GSEMatrix =TRUE, getGPL=FALSE)
if (length(gset) > 1) idx <- grep("GPL90", attr(gset, "names")) else idx <- 1
gset <- gset[[idx]]

To see the metadata:

pData(gset)
ADD COMMENT
0
Entering edit mode

Hi Kevin,

Just to jump on this thread rather than make a new one, I am having a bit of an issue actually using the eset in DESeq2. I downloaded an expression set, transformed it into a summarizedexperiment object, but getting into a deseq object is giving me errors about the counts. Here is the input code:

 library(Biobase)
library (GEOquery)
library(DESeq2)

gsm <- getGEO('GSE116904') #RNA-seq data

gse <- gsm [[1]]

summ_exp <- makeSummarizedExperimentFromExpressionSet(gse)
ddse <- DESeqDataSet(summ_exp, countData = as.matrix(countData),design = ~ title)

And here is the error:

> ddse <- DESeqDataSet(summ_exp, design = ~ title)
renaming the first element in assays to 'counts'
Error in DESeqDataSet(summ_exp, design = ~title) : 
  some values in assay are not integers

When I specify countData:

ddse <- DESeqDataSet(summ_exp, countData = as.matrix(assay(summ_exp)),design = ~ title)

I get the following error:

Error in DESeqDataSet(summ_exp, countData = as.matrix(assay(summ_exp)),  : 
  unused argument (countData = as.matrix(assay(summ_exp)))

I'm not really sure how to get around this.

ADD REPLY
0
Entering edit mode

The raw counts are at the bottom of https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE116904.

Use these and feed them into DESeq2 with DESeqDataSetFromMatrix, follow the manual for this. You are still trying to use microarray functions (getGEO) for RNA-seq data. The RNA-seq data are not available by this function. Check the content of gse it has zero rows so no genes = no counts.

ADD REPLY
0
Entering edit mode

Yes, you need raw counts. The data obtained via getGEO may be empty or represent the normalised / transformed data.

ADD REPLY

Login before adding your answer.

Traffic: 3133 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6