Question: DESeq2 on GEO dataset
0
gravatar for hariprasadp.iitkgp
9 weeks ago by
hariprasadp.iitkgp0 wrote:

Hi, I am looking to perform Differential Gene expression analysis using R and GEO datasets. But I couldn't find the count matrices in the datasets, it is a raw data and need preprocessing. Can anyone help me how to use DESeq2 libary to preprocess the GEO dataset for Differential gene expression analysis.

rna-seq deseq2 preprocessing R geo • 243 views
ADD COMMENTlink modified 10 days ago by Biostar ♦♦ 20 • written 9 weeks ago by hariprasadp.iitkgp0

You have to download the fastq files, see e.g. Fast download of FASTQ files from the European Nucleotide Archive (ENA) and then follow e.g. this Bioconductor RNA-seq workflow https://www.bioconductor.org/packages/devel/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html

ADD REPLYlink written 9 weeks ago by ATpoint36k

Which dataset?

ADD REPLYlink written 9 weeks ago by Kevin Blighe61k

GSE3821_series_matrix

ADD REPLYlink written 9 weeks ago by hariprasadp.iitkgp0
1
gravatar for Kevin Blighe
9 weeks ago by
Kevin Blighe61k
University College London
Kevin Blighe61k wrote:

Okay, that is a microarray, so, you cannot use DESeq2. You should use limma.

To retrieve the already-normalised data, you can simply use:

library(Biobase)
library(GEOquery)
gset <- getGEO("GSE3821", GSEMatrix =TRUE, getGPL=FALSE)
if (length(gset) > 1) idx <- grep("GPL90", attr(gset, "names")) else idx <- 1
gset <- gset[[idx]]

To see the metadata:

pData(gset)
ADD COMMENTlink written 9 weeks ago by Kevin Blighe61k

Hi Kevin,

Just to jump on this thread rather than make a new one, I am having a bit of an issue actually using the eset in DESeq2. I downloaded an expression set, transformed it into a summarizedexperiment object, but getting into a deseq object is giving me errors about the counts. Here is the input code:

 library(Biobase)
library (GEOquery)
library(DESeq2)

gsm <- getGEO('GSE116904') #RNA-seq data

gse <- gsm [[1]]

summ_exp <- makeSummarizedExperimentFromExpressionSet(gse)
ddse <- DESeqDataSet(summ_exp, countData = as.matrix(countData),design = ~ title)

And here is the error:

> ddse <- DESeqDataSet(summ_exp, design = ~ title)
renaming the first element in assays to 'counts'
Error in DESeqDataSet(summ_exp, design = ~title) : 
  some values in assay are not integers

When I specify countData:

ddse <- DESeqDataSet(summ_exp, countData = as.matrix(assay(summ_exp)),design = ~ title)

I get the following error:

Error in DESeqDataSet(summ_exp, countData = as.matrix(assay(summ_exp)),  : 
  unused argument (countData = as.matrix(assay(summ_exp)))

I'm not really sure how to get around this.

ADD REPLYlink modified 12 days ago • written 12 days ago by ogola8910

The raw counts are at the bottom of https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE116904.

Use these and feed them into DESeq2 with DESeqDataSetFromMatrix, follow the manual for this. You are still trying to use microarray functions (getGEO) for RNA-seq data. The RNA-seq data are not available by this function. Check the content of gse it has zero rows so no genes = no counts.

ADD REPLYlink written 12 days ago by ATpoint36k

Yes, you need raw counts. The data obtained via getGEO may be empty or represent the normalised / transformed data.

ADD REPLYlink written 12 days ago by Kevin Blighe61k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1602 users visited in the last hour