Subsetting a RangedSummarizedExperiment in R
2
2
Entering edit mode
4.5 years ago
aluesley1 ▴ 60

Hi, sorry if this is a bit basic but I'm very new to R.

I am working in Rstudio and I have gene expression data downloaded using TCGAbiolinks, I have downloaded it and prepared it into a RangedSummarizedObject so it includes the clinical data.

I want to subset the data based on this clinical data.

This is the information I found from the SummarizedExperiment package manual -

Subsetting In the code snippets below, x is a RangedSummarizedExperiment object. subset(x, subset, select): Create a subset of x using an expression subset referring to columns of rowRanges(x) (including ‘seqnames’, ‘start’, ‘end’, ‘width’, ‘strand’, and names(rowData(x))) and / or select referring to column names of colData(x)

My object is called prep.BRCA.tumour so I will have subset(prep.BRCA.tumour, ...then I don't know how to write out the next bit.

The data I want is in listData within the colData dataframe.

If anyone could shed some light as to how I would write this it would be much appreciated.

rangedsummarizedexperiment tcgabiolinks R • 5.8k views
5
Entering edit mode
4.5 years ago
svlachavas ▴ 770

Dear Aluesley,

firstly you have to be more specific of your type of analysis and target goals, as your description is rather confusing ?

1) So firstly, as you mentioned that you are a new user in R, you have to first familiarize a bit with "data containers" and basic workflows, as nicely included in the following links :

http://bioconductor.org/packages/release/BiocViews.html#___GeneExpressionWorkflow

2) You have also to post your exact code used thus far, in order also for the other users to inspect your approach and to be more capable to provide information

3) If i can understood well, you want to subset your cancer dataset, to only these samples that have a tumor subtype from the relative TCGA publications ? If yes, this is a link from the TCGAbiolinks vignette section for molecular subtypes information and subsetting:

http://bioconductor.org/packages/release/bioc/vignettes/SummarizedExperiment/inst/doc/SummarizedExperiment.html

dataSubt <- TCGAquery_subtype(tumor = "YourTCGAproject")

se.subset <- subset(se, select = colData(se)$patient %in% dataSubt$patient) # subset only to the patients with available subtype information


Hope that helps,

Efstathios

1
Entering edit mode
4.5 years ago

I second everything svlachavas wrote (i.e. following some basic tutorials about R), but in the meantime -- will this do the job?

my_subset <-  colData(x)\$listData