Question

Individual Vertical Analysis from TCGA Data

2

Entering edit mode

10.2 years ago

jth ▴ 190

Hi!

I was wondering whether it is feasible to work with multiple types of data from same patient in TCGA to reveal mechanism of a cancer in that particular patient using tumor and adjacent healthy tissue controls. I am planning to analyse each data type individually and merge findings in a higher-level (e.g. pathway most probably).

I have been reading on it a lot but could be able to find satisfactory answers to following:

On TCGA data quality, what do you think on the reliability of data for individual analysis? I have received different opinions on this matter, so I'm a bit confused right now.
For all types of data, since tumor and adjacent tissue controls are in the same batch in TCGA, how should I check for other possible biases? I guess batch effect correction becomes irrelevant in this case. This question particularly becomes relevant in array based methods, such as HumanMethylation450k. If tumor and control scanned in different arrays, how can I be sure that there is no additional bias coming from array scanning (except probably expecting low DMR count)?

I really appreciate your valuable input in these matters.

Thanks a lot!

tcga omics cancer HumanMethylation450k • 2.7k views

ADD COMMENT • link updated 2.4 years ago by Ram 45k • written 10.2 years ago by jth ▴ 190

Ram · Accepted Answer · 2015-05-07

3

Entering edit mode

10.2 years ago

Sean Davis 27k

On point 1, there is not really a general answer to your question. If your analysis demands high-quality data (and not all do), then doing some QC yourself is definitely necessary. On point 2, not all data types for all tumor types include "normal" controls, so the points you bring up may not be relevant. As for technical variation, that is something largely out of your control once the data are collected. Using formal hypothesis testing with multiple samples is really one of the best ways to approach datasets that can deal with variation in the data in an efficient manner (statistically speaking).

ADD COMMENT • link updated 2.4 years ago by Ram 45k • written 10.2 years ago by Sean Davis 27k

0

Entering edit mode

Thanks for thought provoking comments. On point 2, I prepared my dataset with only tumor samples and their adjacent normal tissues. Any tumor without a "normal" is discarded. Hence, I raised this issue. But, I guess I understood your point. At least it gave me an more clear idea on how to proceed. Thanks again.

ADD REPLY • link updated 2.4 years ago by Ram 45k • written 10.2 years ago by jth ▴ 190

0

Entering edit mode

All samples are fresh frozen, and have to pass strict metrics of size, purity, etc. Click here for all the sample QC forms. Samples of each data type are processed very similarly... all except for mutation calling. Your qn is too general to give a better answer, but I certainly think you should keep going. When you come up with a biologically relevant qn to ask about those adjacent normals, post a new qn on Biostars, and we'll help figure out whether it can be done with the available data. Here's a lead that very few ppl are working on - gene expression or epigenetic profiles in tumor microenvironment and comparison to GTEX.

ADD REPLY • link updated 2.4 years ago by Ram 45k • written 10.2 years ago by Cyriac Kandoth 6.1k

0

Entering edit mode

Thanks a lot for the comments and links! I'll open up a new question on adjacent normals quite soon I believe :)

ADD REPLY • link updated 2.4 years ago by Ram 45k • written 10.2 years ago by jth ▴ 190