Question: Subsetting a RangedSummarizedExperiment in R
1
gravatar for aluesley1
10 days ago by
aluesley120
aluesley120 wrote:

Hi, sorry if this is a bit basic but I'm very new to R.

I am working in Rstudio and I have gene expression data downloaded using TCGAbiolinks, I have downloaded it and prepared it into a RangedSummarizedObject so it includes the clinical data.

I want to subset the data based on this clinical data.

This is the information I found from the SummarizedExperiment package manual -

Subsetting In the code snippets below, x is a RangedSummarizedExperiment object. subset(x, subset, select): Create a subset of x using an expression subset referring to columns of rowRanges(x) (including ‘seqnames’, ‘start’, ‘end’, ‘width’, ‘strand’, and names(rowData(x))) and / or select referring to column names of colData(x)

My object is called prep.BRCA.tumour so I will have subset(prep.BRCA.tumour, ...then I don't know how to write out the next bit.

The data I want is in listData within the colData dataframe.

If anyone could shed some light as to how I would write this it would be much appreciated.

ADD COMMENTlink modified 10 days ago by Friederike1.8k • written 10 days ago by aluesley120
2
gravatar for svlachavas
10 days ago by
svlachavas450
Greece
svlachavas450 wrote:

Dear Aluesley,

firstly you have to be more specific of your type of analysis and target goals, as your description is rather confusing ?

1) So firstly, as you mentioned that you are a new user in R, you have to first familiarize a bit with "data containers" and basic workflows, as nicely included in the following links :

http://bioconductor.org/packages/release/BiocViews.html#___GeneExpressionWorkflow

2) You have also to post your exact code used thus far, in order also for the other users to inspect your approach and to be more capable to provide information

3) If i can understood well, you want to subset your cancer dataset, to only these samples that have a tumor subtype from the relative TCGA publications ? If yes, this is a link from the TCGAbiolinks vignette section for molecular subtypes information and subsetting:

http://bioinformaticsfmrp.github.io/TCGAbiolinks/extension.html#tcga_molecularsubtype:_query_subtypes_for_cancer_data:

4) the following link will also help you get started:

http://bioconductor.org/packages/release/bioc/vignettes/SummarizedExperiment/inst/doc/SummarizedExperiment.html

Overall, a quick example for TCGAbiolinks, in case you have downloaded an arbitary RangedSummarizedExperiment called se:

dataSubt <- TCGAquery_subtype(tumor = "YourTCGAproject") 


se.subset <- subset(se, select = colData(se)$patient %in% dataSubt$patient) # subset only to the patients with available subtype information

Hope that helps,

Efstathios

ADD COMMENTlink modified 10 days ago • written 10 days ago by svlachavas450
1
gravatar for Friederike
10 days ago by
Friederike1.8k
United States
Friederike1.8k wrote:

I second everything svlachavas wrote (i.e. following some basic tutorials about R), but in the meantime -- will this do the job?

my_subset <-  colData(x)$listData
head(my_subset)
ADD COMMENTlink modified 10 days ago • written 10 days ago by Friederike1.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 715 users visited in the last hour