Question: Pipeline for analyzing Microarray & RNA-seq GSE files from NCBI GEO
0
gravatar for junsionglow
16 months ago by
junsionglow20
junsionglow20 wrote:

0 down vote favorite I have very limited experience in R but would like to know if anyone can share their pipeline for analyzing GSE files from GEO, both for microarray and/or RNA-seq. The eventual goal would be to look at the differentially expressed genes.

For example, I would like to analyze GSE113590 which is a RNA-seq data and GSE47045 which is a microarray data.

The general consensus seems to be that you download the data using this:

source("http://bioconductor.org/biocLite.R")
biocLite("GEOquery")
library("GEOquery")
gset <- getGEO("GSE113590", GSEMatrix =TRUE)

But I'm not sure how to move forward from here, and there seems to be a different pipeline depending on whether it is a microarray / RNA-seq.

Thanks for your help.

pipline rna-seq microarray R • 1.6k views
ADD COMMENTlink modified 16 months ago by ewre220 • written 16 months ago by junsionglow20

Hello junsionglow!

It appears that your post has been cross-posted to another site: https://stackoverflow.com/questions/51689293/

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLYlink written 16 months ago by h.mon28k

Understood, my apologies. I have taken down the post on stackoverflow.

ADD REPLYlink written 16 months ago by junsionglow20
1
gravatar for h.mon
16 months ago by
h.mon28k
Brazil
h.mon28k wrote:

The distributions of RNAseq counts and array intensities is very different, hence the need for different packages. limma is the go-to package for microarray analysis, for RNAseq counts, the main options are edgeR, DESeq2 and limma, after using the voom transformtion on the counts.

ADD COMMENTlink written 16 months ago by h.mon28k

I see. But I cant seem to extract the counts on the gset. I know that for microarray, one could use

gset <- getGEO("GSE47045", GSEMatrix =TRUE)
if (length(gset) > 1) idx <- grep("GPL6246", attr(gset, "names")) else idx <- 1
gset <- gset[[idx]]

str(exprs(gset))

I get

 num [1:34760, 1:24] 12.85 11.2 7.58 13.42 6.72 ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:34760] "10338001" "10338003" "10338004" "10338017" ...
  ..$ : chr [1:24] "GSM1143711" "GSM1143712" "GSM1143713" "GSM1143714" ..

But when it comes to RNA-seq Illumina data, the "if" command line generated NULL counts..

ADD REPLYlink modified 16 months ago • written 16 months ago by junsionglow20

Can you provide an example of a RNAseq accession which causes trouble?

ADD REPLYlink written 16 months ago by h.mon28k

this is the one im trying to analyze GSE113590

ADD REPLYlink written 16 months ago by junsionglow20
0
gravatar for ewre
16 months ago by
ewre220
United States
ewre220 wrote:

Not sure if this one is helpful for you cause it deals with ArrayExpress which is EBI instead of NCBI. As I know, most of the datasets in GEO can be found in ArrayExpress.

ADD COMMENTlink written 16 months ago by ewre220
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1699 users visited in the last hour