Question

How to link "miRNA gene quantification" and "clinical- Patient" data retrieved from TCGA

0

Entering edit mode

5.8 years ago

Björn ▴ 110

The query below is too much details but might be simple to answer for experienced ones. I spent lots of time to figure out but came to conclusion that I need help from experts.

I downloaded "miRNA gene quantification" from TCGA harmonized data which I saved as CSV file. This data set is more like an "Assay with counts". The rownames are "miRNAs" while colnames are "Patient ID", e.g. TCGA-G9-6356-01A-11R-1788-13.This database contains two "sample.type" - 1) Primary Solid Tumor , 2) Solid Tissue Normal

Similarly, I downloaded "clinical" data containing "patient". The rownames are Patient ID e.g.TCGA-G9-6356, while colnames representing different clinical parameters . This file is similar to "colData" with clinical information for edgeR. The "vital-status" parameter contains "dead or alive" which I want to use as "contrast".

My questions 1. How to link Patient ID from "miRNA gene quantification" data where it is TCGA-G9-6356-01A-11R-1788-13 with "clinical" file which contains "TCGA-G9-6356" as Patient ID ? 2. How to measure DE miRNAs in patients between "dead" and "alive". This is not "survival analysis" but a list of DE miRNAs in above mentioned sample.type in relation to vital.status; dead or alive ? 3. If I am to use edgeR separately with above data, can I completely ignore using a) TCGAanalyze_Filtering and b) TCGAanalyze_DEA.

TCGA contrast mirnaseq edger • 2.5k views

ADD COMMENT • link 5.8 years ago by Björn ▴ 110

0

Entering edit mode

Kevin Blighe, Thank you so much for taking time to explain in details. I really appreciate that For first query, I used following commands

names(file.csv)<-substring(names(file.csv),1,12)
colnames(file.csv)[1]<-file.csv("miRNA")
write.csv(file.csv, "newfile.csv",row.names = F)

This creates a new CSV file with only first 12 alpha/digits of sample identifier, allowing to link with sample identifier in clinical data file.

ADD REPLY • link 5.8 years ago by Björn ▴ 110

0

Entering edit mode

Yes, is everything now okay, in that case?

ADD REPLY • link 5.8 years ago by Kevin Blighe 87k

score 2 · Accepted Answer · 2018-06-18

How to link Patient ID from "miRNA gene quantification" data where it is TCGA-G9-6356-01A-11R-1788-13 with "clinical" file which contains "TCGA-G9-6356" as Patient ID ?

In the general basic clinical metadata, there is only 1 entry per individual. Each individual is identified by the short TCGA barcode, e.g., TCGA-G9-6356. Information on each sample biopsy (including tumour and normal tissues) is available elsewhere. If your interest is in survival, then you do not need any other type of clinical data than that which yo already have.

The definition of each piece of clinical metadata can be found here: Clinical Data Elements

How to measure DE miRNAs in patients between "dead" and "alive". This is not "survival analysis" but a list of DE miRNAs in above mentioned sample.type in relation to vital.status; dead or alive ?

Yes, go by the vital status. Keep in mind, though, that the vital status is made at diagnosis. It could be that some patients who are marked 'alive' have since become deceased. It's important to be aware of the limitations of the clinical data. If you want to double check the status of a patient, then do the following:

go to the GDC Data Portal
Enter the TCGA barcode into the search box, e.g., TCGA-G9-6356, wait for the list to generate, and then select the match (usually first in the list)
on the new page that loads, scroll down and select the 'Diagnoses / Treatments' tab in the Clinical section
There you will see vital status at the bottom

If I am to use edgeR separately with above data, can I completely ignore using a) TCGAanalyze_Filtering and b) TCGAanalyze_DEA.

You can use these functions, if you wish. Or, just do some filtering yourself, for example, by removing low count transcripts and/or those that have much missing data.

Kevn