Question: How to subset ExpressionSet based on vector of sample names
0
gravatar for nkabo
6 months ago by
nkabo10
nkabo10 wrote:

I have an ExpressionSet object composed of 37 samples for HCC and cirrhosis situation and I would like to subset it according to names that I specified in two vectors. As a result, I want to have 2 expression arrays (one set is for HCC and other is for cirrhosis) after I subset this ExpressionSet. In order to subset the ExpressionSet, I have tried several methods but I could not get the samples and features at the same time.

expset_forall is ExpressionSet and names_HCC and names_cirr are the character vectors containing the names of samples.

This is expset_forall:

ExpressionSet (storageMode: lockedEnvironment)

assayData: 20962 features, 37 samples 

protocolData
  sampleNames: GSM437457.CEL.gz GSM437458.CEL.gz ... GSM437493.CEL.gz (37 total)...

I have tried:

eset_forHCC = expset_forall[, sampleNames(expset_forall) %in% names_HCC]

it gives error of "incorrect number of dimensions"

then I tried:

eset_forHCC= exprs(expset_forall[expset_forall@protocolData$sampleNames==names_HCC,]
dim(eset_forHCC)
[1]  0 37

At last, I tried to subset it by reaching via pData:

levels(pData(expset_forall)$sampleNames)

it gives "NULL"

As eset_forHCC, I expect the output:

ExpressionSet (storageMode: lockedEnvironment)

assayData: 20962 features, 17 samples

element names: exprs, se.exprs

protocolData

sampleNames: GSM437458.CEL.gz GSM437459.CEL.gz ... GSM437493.CEL.gz (17 total)

As eset_forcirr, I expect the output:

ExpressionSet (storageMode: lockedEnvironment)

assayData: 20962 features, 17 samples

element names: exprs, se.exprs

protocolData

sampleNames: GSM437460.CEL.gz GSM437459.CEL.gz ... GSM437491.CEL.gz (17 total)
ADD COMMENTlink modified 6 months ago • written 6 months ago by nkabo10
1

Once you get the ExpressionSet in a data frame object you can try to perform subset() a base function or filter() from dplyr package.

ADD REPLYlink written 6 months ago by sangram_keshari230

Does:

eset_forHCC= exprs(expset_forall)[,sampleNames(expset_forall) %in% names_HCC]

not work?

ADD REPLYlink modified 6 months ago • written 6 months ago by benformatics1.2k

Thank you for your reply, it works but it gives a matrix I should have an ExpressionSet.

ADD REPLYlink written 6 months ago by nkabo10
eset_forHCC = expset_forall[, sampleNames(expset_forall) %in% names_HCC]

This is correct and recommended way to get the subset. Could you recheck it?

Also, see that the output of sampleNames(expset_forall) %in% names_HCC is as intended.

ADD REPLYlink written 6 months ago by Santosh Anand5.0k

Thank you for your reply, I checked it again and it works fine but it gives an expression matrix, I would like to have it as ExpressionSet.

ADD REPLYlink written 6 months ago by nkabo10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 725 users visited in the last hour