How to subset ExpressionSet based on vector of sample names
0
0
Entering edit mode
3.2 years ago
nkabo ▴ 60

I have an ExpressionSet object composed of 37 samples for HCC and cirrhosis situation and I would like to subset it according to names that I specified in two vectors. As a result, I want to have 2 expression arrays (one set is for HCC and other is for cirrhosis) after I subset this ExpressionSet. In order to subset the ExpressionSet, I have tried several methods but I could not get the samples and features at the same time.

expset_forall is ExpressionSet and names_HCC and names_cirr are the character vectors containing the names of samples.

This is expset_forall:

ExpressionSet (storageMode: lockedEnvironment)

assayData: 20962 features, 37 samples

protocolData
sampleNames: GSM437457.CEL.gz GSM437458.CEL.gz ... GSM437493.CEL.gz (37 total)...


I have tried:

eset_forHCC = expset_forall[, sampleNames(expset_forall) %in% names_HCC]


it gives error of "incorrect number of dimensions"

then I tried:

eset_forHCC= exprs(expset_forall[expset_forall@protocolData$sampleNames==names_HCC,] dim(eset_forHCC) [1] 0 37  At last, I tried to subset it by reaching via pData: levels(pData(expset_forall)$sampleNames)


it gives "NULL"

As eset_forHCC, I expect the output:

ExpressionSet (storageMode: lockedEnvironment)

assayData: 20962 features, 17 samples

element names: exprs, se.exprs

protocolData

sampleNames: GSM437458.CEL.gz GSM437459.CEL.gz ... GSM437493.CEL.gz (17 total)


As eset_forcirr, I expect the output:

ExpressionSet (storageMode: lockedEnvironment)

assayData: 20962 features, 17 samples

element names: exprs, se.exprs

protocolData

sampleNames: GSM437460.CEL.gz GSM437459.CEL.gz ... GSM437491.CEL.gz (17 total)

R ExpressionSet subset Bioconductor • 1.5k views
1
Entering edit mode

Once you get the ExpressionSet in a data frame object you can try to perform subset() a base function or filter() from dplyr package.

0
Entering edit mode

Does:

eset_forHCC= exprs(expset_forall)[,sampleNames(expset_forall) %in% names_HCC]


not work?

0
Entering edit mode

Thank you for your reply, it works but it gives a matrix I should have an ExpressionSet.

0
Entering edit mode
eset_forHCC = expset_forall[, sampleNames(expset_forall) %in% names_HCC]


This is correct and recommended way to get the subset. Could you recheck it?

Also, see that the output of sampleNames(expset_forall) %in% names_HCC is as intended.

0
Entering edit mode

Thank you for your reply, I checked it again and it works fine but it gives an expression matrix, I would like to have it as ExpressionSet.