Question: How to link "miRNA gene quantification" and "clinical- Patient" data retrieved from TCGA
0
gravatar for Björn
3 months ago by
Björn30
Björn30 wrote:

The query below is too much details but might be simple to answer for experienced ones. I spent lots of time to figure out but came to conclusion that I need help from experts.

I downloaded "miRNA gene quantification" from TCGA harmonized data which I saved as CSV file. This data set is more like an "Assay with counts". The rownames are "miRNAs" while colnames are "Patient ID", e.g. TCGA-G9-6356-01A-11R-1788-13.This database contains two "sample.type" - 1) Primary Solid Tumor , 2) Solid Tissue Normal

Similarly, I downloaded "clinical" data containing "patient". The rownames are Patient ID e.g.TCGA-G9-6356, while colnames representing different clinical parameters . This file is similar to "colData" with clinical information for edgeR. The "vital-status" parameter contains "dead or alive" which I want to use as "contrast".

My questions 1. How to link Patient ID from "miRNA gene quantification" data where it is TCGA-G9-6356-01A-11R-1788-13 with "clinical" file which contains "TCGA-G9-6356" as Patient ID ? 2. How to measure DE miRNAs in patients between "dead" and "alive". This is not "survival analysis" but a list of DE miRNAs in above mentioned sample.type in relation to vital.status; dead or alive ? 3. If I am to use edgeR separately with above data, can I completely ignore using a) TCGAanalyze_Filtering and b) TCGAanalyze_DEA.

edger mirnaseq tcga contrast • 233 views
ADD COMMENTlink modified 3 months ago • written 3 months ago by Björn30

Kevin Blighe, Thank you so much for taking time to explain in details. I really appreciate that For first query, I used following commands

names(file.csv)<-substring(names(file.csv),1,12)
colnames(file.csv)[1]<-file.csv("miRNA")
write.csv(file.csv, "newfile.csv",row.names = F)

This creates a new CSV file with only first 12 alpha/digits of sample identifier, allowing to link with sample identifier in clinical data file.

ADD REPLYlink written 3 months ago by Björn30

Yes, is everything now okay, in that case?

ADD REPLYlink written 3 months ago by Kevin Blighe28k
2
gravatar for Kevin Blighe
3 months ago by
Kevin Blighe28k
USA / Europe / Brazil
Kevin Blighe28k wrote:
  1. How to link Patient ID from "miRNA gene quantification" data where it is TCGA-G9-6356-01A-11R-1788-13 with "clinical" file which contains "TCGA-G9-6356" as Patient ID ?

In the general basic clinical metadata, there is only 1 entry per individual. Each individual is identified by the short TCGA barcode, e.g., TCGA-G9-6356. Information on each sample biopsy (including tumour and normal tissues) is available elsewhere. If your interest is in survival, then you do not need any other type of clinical data than that which yo already have.

The definition of each piece of clinical metadata can be found here: Clinical Data Elements

  1. How to measure DE miRNAs in patients between "dead" and "alive". This is not "survival analysis" but a list of DE miRNAs in above mentioned sample.type in relation to vital.status; dead or alive ?

Yes, go by the vital status. Keep in mind, though, that the vital status is made at diagnosis. It could be that some patients who are marked 'alive' have since become deceased. It's important to be aware of the limitations of the clinical data. If you want to double check the status of a patient, then do the following:

  1. go to the GDC Data Portal
  2. Enter the TCGA barcode into the search box, e.g., TCGA-G9-6356, wait for the list to generate, and then select the match (usually first in the list)
  3. on the new page that loads, scroll down and select the 'Diagnoses / Treatments' tab in the Clinical section
  4. There you will see vital status at the bottom

f

  1. If I am to use edgeR separately with above data, can I completely ignore using a) TCGAanalyze_Filtering and b) TCGAanalyze_DEA.

You can use these functions, if you wish. Or, just do some filtering yourself, for example, by removing low count transcripts and/or those that have much missing data.

Kevn

ADD COMMENTlink written 3 months ago by Kevin Blighe28k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 649 users visited in the last hour