Question: Bioconductor, how to select a subset of samples in an ExpressionSet?
0
gravatar for Davide Chicco
19 months ago by
Davide Chicco110
Canada
Davide Chicco110 wrote:

I'm working on an R script that downloads gene expression data from GEO, through Bioconductor and the getGEO() function.

These commands download all the 436 samples of the repository, but I'm only interested in 157 of them. Precisely, I'm interested in handling only the "samples collection:ch1" column with values ""on the 1st day of MI (admission)" or "N/A". How can I select this subset?

I tried to download the complete dataset into the gset variable, and then to apply the following command:

gset_reduced@phenoData@data <- gset@phenoData@data[gset@phenoData@data$"samples collection:ch1"=="on the 1st day of MI (admission)" | gset@phenoData@data$"samples collection:ch1"=="N/A", ]

but this way only the phenoData feature of the gset_reducted variable correctly contained the 157 samples. The other features of the gset_reducted variable, instead, still contain 436 samples.

How can I select my subset of samples right from the beginning?

Here's my working R code:

setwd(".")
options(stringsAsFactors = FALSE)

source("https://bioconductor.org/biocLite.R")

listOfBiocPackages <- c("oligo", "GEOquery", "affyio", "biomaRt", "sva", "pamr", "limma", "BiocParallel", "genefilter", "GO.db")

bioCpackagesNotInstalled <- which( !listOfBiocPackages %in% rownames(installed.packages()) )
cat("package missing listOfBiocPackages[", bioCpackagesNotInstalled, "]: ", listOfBiocPackages[bioCpackagesNotInstalled], "\n", sep="")

# check there's still something left to install
if( length(bioCpackagesNotInstalled) ) {
    biocLite(listOfBiocPackages[bioCpackagesNotInstalled])
}


library("oligo")
library("GEOquery")
library("affyio")
library("biomaRt")
library("sva")
library("pamr")
library("limma")
library("BiocParallel")

GSE_code <- "GSE59867"
getGEOSuppFiles(GSE_code) 
gset <- getGEO(GSE_code, GSEMatrix =TRUE, getGPL=FALSE)

if (length(gset) > 1) idx <- grep("GPL570", attr(gset, "names")) else idx <- 1
gset <- gset[[idx]]

gset_reduced <- gset

gset_reduced@phenoData@data <- gset@phenoData@data[gset@phenoData@data$"samples collection:ch1"=="on the 1st day of MI (admission)" | gset@phenoData@data$"samples collection:ch1"=="N/A", ]

Can anyone help me?

Thanks!

ADD COMMENTlink modified 19 months ago by Kevin Blighe63k • written 19 months ago by Davide Chicco110
3
gravatar for Kevin Blighe
19 months ago by
Kevin Blighe63k
Kevin Blighe63k wrote:

The way that ExpressionSet objects work, you can just filter on the main object, and the changes will then carry through to all sub-components of the object:

filter <- colnames(gset)[gset@phenoData@data$"samples collection:ch1"=="on the 1st day of MI (admission)" | gset@phenoData@data$"samples collection:ch1"=="N/A"]

length(filter)
[1] 157

gset.filt <- gset[,filter]

gset.filt

ExpressionSet (storageMode: lockedEnvironment)
assayData: 33297 features, 157 samples 
  element names: exprs 
protocolData: none
phenoData
  sampleNames: GSM1448335 GSM1448338 ... GSM1620804 (157 total)
  varLabels: title geo_accession ... samples collection:ch1 (34 total)
  varMetadata: labelDescription
featureData: none
experimentData: use 'experimentData(object)'
Annotation: GPL6244

dim(exprs(gset.filt))
[1] 33297   157

dim(pData(gset.filt))
[1] 157  34

nrow(gset.filt@phenoData@data)
[1] 157

Kevin

ADD COMMENTlink written 19 months ago by Kevin Blighe63k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 685 users visited in the last hour