Subsetting a RangedSummarizedExperiment in R
2
2
Entering edit mode
3.3 years ago
aluesley1 ▴ 60

Hi, sorry if this is a bit basic but I'm very new to R.

I am working in Rstudio and I have gene expression data downloaded using TCGAbiolinks, I have downloaded it and prepared it into a RangedSummarizedObject so it includes the clinical data.

I want to subset the data based on this clinical data.

This is the information I found from the SummarizedExperiment package manual -

Subsetting In the code snippets below, x is a RangedSummarizedExperiment object. subset(x, subset, select): Create a subset of x using an expression subset referring to columns of rowRanges(x) (including ‘seqnames’, ‘start’, ‘end’, ‘width’, ‘strand’, and names(rowData(x))) and / or select referring to column names of colData(x)

My object is called prep.BRCA.tumour so I will have subset(prep.BRCA.tumour, ...then I don't know how to write out the next bit.

The data I want is in listData within the colData dataframe.

If anyone could shed some light as to how I would write this it would be much appreciated.

rangedsummarizedexperiment tcgabiolinks R • 3.7k views
ADD COMMENT
5
Entering edit mode
3.3 years ago
svlachavas ▴ 750

Dear Aluesley,

firstly you have to be more specific of your type of analysis and target goals, as your description is rather confusing ?

1) So firstly, as you mentioned that you are a new user in R, you have to first familiarize a bit with "data containers" and basic workflows, as nicely included in the following links :

http://bioconductor.org/packages/release/BiocViews.html#___GeneExpressionWorkflow

2) You have also to post your exact code used thus far, in order also for the other users to inspect your approach and to be more capable to provide information

3) If i can understood well, you want to subset your cancer dataset, to only these samples that have a tumor subtype from the relative TCGA publications ? If yes, this is a link from the TCGAbiolinks vignette section for molecular subtypes information and subsetting:

http://bioinformaticsfmrp.github.io/TCGAbiolinks/extension.html#tcga_molecularsubtype:_query_subtypes_for_cancer_data:

4) the following link will also help you get started:

http://bioconductor.org/packages/release/bioc/vignettes/SummarizedExperiment/inst/doc/SummarizedExperiment.html

Overall, a quick example for TCGAbiolinks, in case you have downloaded an arbitary RangedSummarizedExperiment called se:

dataSubt <- TCGAquery_subtype(tumor = "YourTCGAproject") 


se.subset <- subset(se, select = colData(se)$patient %in% dataSubt$patient) # subset only to the patients with available subtype information

Hope that helps,

Efstathios

ADD COMMENT
1
Entering edit mode
3.3 years ago

I second everything svlachavas wrote (i.e. following some basic tutorials about R), but in the meantime -- will this do the job?

my_subset <-  colData(x)$listData
head(my_subset)
ADD COMMENT

Login before adding your answer.

Traffic: 1236 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6