Question: Parsing an ExpressionSet for all SRA addresses
gravatar for kmyers2
13 months ago by
University of Wisconsin-Madison
kmyers240 wrote:

I am using GEOquery to download the soft files for a number of experiments from NCBI GEO. For example, here is one individual experiment:

> soft <- getGEO('GSE104278', GSEMatrix=T)
> soft
ExpressionSet (storageMode: lockedEnvironment)
assayData: 0 features, 12 samples 
  element names: exprs 
protocolData: none
  sampleNames: GSM2793963 GSM2793964 ... GSM2793974 (12 total)
  varLabels: title geo_accession ... strain:ch1 (48 total)
  varMetadata: labelDescription
featureData: none
experimentData: use 'experimentData(object)'
  pubMedIds: 30456366 
Annotation: GPL24048

I want to extract the NCBI addresses for all SRA entries present in this expression set, so I am using the following command:

> sra <- data.frame(soft$GSE104278_series_matrix.txt.gz$relation.1)
> sra
1  SRA:
2  SRA:
3  SRA:
4  SRA:
5  SRA:
6  SRA:
7  SRA:
8  SRA:
9  SRA:
10 SRA:
11 SRA:
12 SRA:

However, because I will be doing this on a several dozens to hundreds of files, I need a way to automate the sra data frame building step. I'd like to figure out how to automatically call the first "entry" (or column or whatever the ExprssionSet calls it) or pull out that entry to use it in the command. I'll need a new entry for each experiment I use, so typing them in one at a time is not practical.

I've tried something like this:

> sra <- data.frame(soft[1]$relation.1)

But that yields a data frame with 0 columns and 0 rows.

I've tried this:

> data <- soft[1,]$relation.1


> data <- soft[,1]$relation.1

But that yields an error of "incorrect number of dimensions".

I'm sure there's an easy solution, but I'm just not seeing it. Any help would be greatly appreciated. Thanks!

R rna-seq expressionset parse sra • 347 views
ADD COMMENTlink modified 13 months ago by Mark800 • written 13 months ago by kmyers240
gravatar for Mark
13 months ago by
Mark800 wrote:

I'm confused what you're trying to do exactly. But I think you want to call the variable inside the dataframe automatically. soft in this case is a list with named objects.

tmp <- names(soft)
[1] "GSE104278_series_matrix.txt.gz"

If the list contains multiple items, then you will get a vector that you can subset using the square bracket notation: tmp[1]. You can use this character to call the variable in the dataframe like this:

data <- soft[[tmp]]$relation.1

This is equivalent to this:

data <- soft[["GSE104278_series_matrix.txt.gz"]]$relation.1

The double square bracket notation is used to reference a specific element in a list/vector.

Putting it all together, given a getGEO object (whatever it's called):

softGet <- function(soft) {
  tmp <- names(soft)
  lapply(tmp, function(x) (

This should return a list of dataframes containing the I think biosamples of each GSE object.

ADD COMMENTlink written 13 months ago by Mark800

Wow! Thanks so much. That's exactly what I wanted. It makes sense and I understand better how the ExpressionSet is organized.

ADD REPLYlink written 13 months ago by kmyers240

Glad I could help. Good luck.

ADD REPLYlink written 13 months ago by Mark800
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1353 users visited in the last hour