Matching transcriptomic data to clinical data
0
0
Entering edit mode
12 weeks ago
Khadija ▴ 10

Hello everyone,

I'm new to the R language and I need some help.

I have downloaded the transcriptomic data related to the "TCGA-GBM" by using the package TCGAbiolinks, and I've also downloaded the clinical data (XML files), knowing that I'm interested in clinical_radiation.

What I want to do is for example combine the transcriptomic data to the clinical radiation for example.

Can anyone help me please? I'm really stuck.

Thanks in advance for your replies.

R • 878 views
ADD COMMENT
0
Entering edit mode

What do you mean by "combine"? Please show us the code you used to get your datasets.

ADD REPLY
0
Entering edit mode
query_GBM2 <- GDCquery(project = "TCGA-GBM",
  data.category = "Transcriptome Profiling", 
  data.type = "Gene Expression Quantification",
  workflow.type="STAR - Counts")

GDCdownload(query_GBM2) data1<- GDCprepare(query_GBM2, save = TRUE, save.filename = "data1.rda")

GBMMatrice <- assay(data1,"tpm_unstrand")

Rnaseq_CorOutliers <- TCGAanalyze_Preprocessing(data1) 
dataNorm1 <- TCGAanalyze_Normalization(tabDF = data1, geneInfo = geneInfoHT) 
dataFilt1 <- TCGAanalyze_Filtering(tabDF = dataNorm1, method = "quantile", qnt.cut = 0.25) samplesNT1 <- TCGAquery_SampleTypes(barcode = colnames(dataFilt1), c("NT"))
samplesTP1 <- TCGAquery_SampleTypes(barcode = colnames(dataFilt1), c("TP")) 
dataDEGs1 <- TCGAanalyze_DEA(mat1 = dataFilt1[,samplesNT1], mat2 = dataFilt1[,samplesTP1], Cond1type = "Normal", Cond2type = "Tumor", fdr.cut = 0.01 , logFC.cut = 1, method = "glmLRT") dataDEGsFiltLevel1 <- TCGAanalyze_LevelTab(dataDEGs1,"Tumor","Normal", dataFilt1[,samplesTP1],dataFilt1[,samplesNT1]) 
query_GBM<- GDCquery( project = "TCGA-GBM", data.category = "Clinical", data.type = "Clinical Supplement", data.format = "BCR Biotab" ) 
GDCdownload(query_GBM) clinical.BCRtab.all <- GDCprepare(query_GBM) names(clinical.BCRtab.all) 
clinical.BCRtab.all$clinical_radiation_gbm %>% head %>% DT::datatable(options = list(scrollX = TRUE, keys = TRUE))

This is the code that I used. What I am trying to do is actually a transcriptomic analysis and retrieve the DEGs. After that, I have tried to combine the transcriptomic file (which contains the samples of patients and the genes) to the clinical data (especially the clinical radiation) and this is where I got stuck. As I said, I'm not familiar with R so I don't know if my procedure is right or not. Thank you for your reply.

ADD REPLY
0
Entering edit mode

Where did you get this code from? You'll probably just need a join of some sort to merge the clinical and transcriptomic data.

ADD REPLY
0
Entering edit mode

I got it from the script of this topic: "TCGA Workflow: Analyze cancer genomics and epigenomics data using Bioconductor packages". It's the work of Tiago Chedraoui Silva and al. The fact is I don't know how I could merge the data. I think that what the two datasets have in common is the BCR_patient_barcode, but I don't know which function I could use...

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

That helps, thanks. What is the output to:

head(GBMMatrice)
head(clinical.BCRtab.all$clinical_radiation_gbm)
ADD REPLY
0
Entering edit mode

enter image description here

ADD REPLY
0
Entering edit mode

Please do not paste screenshots of plain text content, it is counterproductive. You can copy paste the content directly here (using the code formatting option shown below), or use a GitHub Gist if the content volume exceeds allowed length here.

code_formatting

ADD REPLY
1
Entering edit mode

Ah, thank you, I didn't know. Here is the link for the GitHub Gist:

ADD REPLY
0
Entering edit mode

Nicely done. However, given that you have a ton of columns (which I did not know), maybe a better option would have been GBMMatrice[1:10,1:10]. In any case, please use gists/direct copy-paste in the future, it makes things a lot easier than screenshots

ADD REPLY
0
Entering edit mode

Thanks for the advice! I'll keep that in mind for next time.

ADD REPLY
0
Entering edit mode

enter image description here

ADD REPLY
0
Entering edit mode

Thank you for your help. For the head (GBMMatrice), I have multiple samples not only the ones that are on the screenshot, but it's always with the same rows of genes.

ADD REPLY
0
Entering edit mode

Use a SummarizedExperiment object - that will work best for you. With some digging, you'll be able to figure out how to place both pieces of information in the same SummarizedExperiment object.

ADD REPLY
0
Entering edit mode

Do you mean using this function like below? SummarizedExperiment(GBMMatrice) SummarizedExperiment(clinical.radiation)

ADD REPLY
0
Entering edit mode

No - please do some digging. The RNA data will be the assay and clinical metadata will become colData (I think, I'm not sure I got the keyword right). There is very minimal work required, I want you to put the effort in instead of asking around.

ADD REPLY
0
Entering edit mode

Thank you for your help . I'll keep digging.

ADD REPLY
0
Entering edit mode

I tried something else. I used the package TCGAutils to convert the file-id to the corresponding TCGAbarcode of the patients. Now, I'll try to use a function to merge the two datasets.

ADD REPLY
0
Entering edit mode

Try everything and post the solution that works for you as an answer. You are so close to cracking this!

ADD REPLY

Login before adding your answer.

Traffic: 1626 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6