Question: Subsetting a RangedSummarizedExperiment in R
2
gravatar for aluesley1
6 months ago by
aluesley140
aluesley140 wrote:

Hi, sorry if this is a bit basic but I'm very new to R.

I am working in Rstudio and I have gene expression data downloaded using TCGAbiolinks, I have downloaded it and prepared it into a RangedSummarizedObject so it includes the clinical data.

I want to subset the data based on this clinical data.

This is the information I found from the SummarizedExperiment package manual -

Subsetting In the code snippets below, x is a RangedSummarizedExperiment object. subset(x, subset, select): Create a subset of x using an expression subset referring to columns of rowRanges(x) (including ‘seqnames’, ‘start’, ‘end’, ‘width’, ‘strand’, and names(rowData(x))) and / or select referring to column names of colData(x)

My object is called prep.BRCA.tumour so I will have subset(prep.BRCA.tumour, ...then I don't know how to write out the next bit.

The data I want is in listData within the colData dataframe.

If anyone could shed some light as to how I would write this it would be much appreciated.

ADD COMMENTlink modified 6 months ago by Friederike2.3k • written 6 months ago by aluesley140
4
gravatar for svlachavas
6 months ago by
svlachavas530
Greece
svlachavas530 wrote:

Dear Aluesley,

firstly you have to be more specific of your type of analysis and target goals, as your description is rather confusing ?

1) So firstly, as you mentioned that you are a new user in R, you have to first familiarize a bit with "data containers" and basic workflows, as nicely included in the following links :

http://bioconductor.org/packages/release/BiocViews.html#___GeneExpressionWorkflow

2) You have also to post your exact code used thus far, in order also for the other users to inspect your approach and to be more capable to provide information

3) If i can understood well, you want to subset your cancer dataset, to only these samples that have a tumor subtype from the relative TCGA publications ? If yes, this is a link from the TCGAbiolinks vignette section for molecular subtypes information and subsetting:

http://bioinformaticsfmrp.github.io/TCGAbiolinks/extension.html#tcga_molecularsubtype:_query_subtypes_for_cancer_data:

4) the following link will also help you get started:

http://bioconductor.org/packages/release/bioc/vignettes/SummarizedExperiment/inst/doc/SummarizedExperiment.html

Overall, a quick example for TCGAbiolinks, in case you have downloaded an arbitary RangedSummarizedExperiment called se:

dataSubt <- TCGAquery_subtype(tumor = "YourTCGAproject") 


se.subset <- subset(se, select = colData(se)$patient %in% dataSubt$patient) # subset only to the patients with available subtype information

Hope that helps,

Efstathios

ADD COMMENTlink modified 6 months ago • written 6 months ago by svlachavas530
1
gravatar for Friederike
6 months ago by
Friederike2.3k
United States
Friederike2.3k wrote:

I second everything svlachavas wrote (i.e. following some basic tutorials about R), but in the meantime -- will this do the job?

my_subset <-  colData(x)$listData
head(my_subset)
ADD COMMENTlink modified 6 months ago • written 6 months ago by Friederike2.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2064 users visited in the last hour