Question: Individual Vertical Analysis from TCGA Data
gravatar for jth
5.7 years ago by
jth190 wrote:


I was wondering whether it is feasible to work with multiple types of data from same patient in TCGA to reveal mechanism of a cancer in that particular patient using tumor and adjacent healthy tissue controls. I am planning to analyse each data type individually and merge findings in a higher-level (e.g. pathway most probably). 

I have been reading on it a lot but could be able to find satisfactory answers to following:

  1. On TCGA data quality, what do you think on the reliability of data for individual analysis? I have received different opinions on this matter, so I'm a bit confused right now.
  2. For all types of data, since tumor and adjacent tissue controls are in the same batch in TCGA, how should I check for other possible biases? I guess batch effect correction becomes irrelevant in this case. This question particularly becomes relevant in array based methods, such as HumanMethylation450k. If tumor and control scanned in different arrays, how can I be sure that there is no additional bias coming from array scanning (except probably expecting low DMR count)? 

I really appreciate your valuable input in these matters.

Thanks a lot! 

ADD COMMENTlink modified 5.7 years ago by Sean Davis26k • written 5.7 years ago by jth190
gravatar for Sean Davis
5.7 years ago by
Sean Davis26k
National Institutes of Health, Bethesda, MD
Sean Davis26k wrote:

On point 1, there is not really a general answer to your question.  If your analysis demands high-quality data (and not all do), then doing some QC yourself is definitely necessary.  On point 2, not all data types for all tumor types include "normal" controls, so the points you bring up may not be relevant.  As for technical variation, that is something largely out of your control once the data are collected.  Using formal hypothesis testing with multiple samples is really one of the best ways to approach datasets that can deal with variation in the data in an efficient manner (statistically speaking).



ADD COMMENTlink written 5.7 years ago by Sean Davis26k

Thanks for thought provoking comments. On point 2, I prepared my dataset with only tumor samples and their adjacent normal tissues. Any tumor without a "normal" is discarded. Hence, i raised this issue. But, I guess I understood your point. At least it gave me an more clear idea on how to proceed. Thanks again.

ADD REPLYlink written 5.7 years ago by jth190

All samples are fresh frozen, and have to pass strict metrics of size, purity, etc. Click here for all the sample QC forms. Samples of each data type are processed very similarly... all except for mutation calling. Your qn is too general to give a better answer, but I certainly think you should keep going. When you come up with a biologically relevant qn to ask about those adjacent normals, post a new qn on Biostars, and we'll help figure out whether it can be done with the available data. Here's a lead that very few ppl are working on - gene expression or epigenetic profiles in tumor microenvironment and comparison to GTEX.

ADD REPLYlink modified 5.7 years ago • written 5.7 years ago by Cyriac Kandoth5.5k

Thanks a lot for the comments and links! I'll open up a new question on adjacent normals quite soon i believe :)

ADD REPLYlink written 5.7 years ago by jth190
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 962 users visited in the last hour