Tcga: "Tumor, Matched Normal" Vs. "Normal, Matched Tumor"
2
5
Entering edit mode
8.0 years ago
Mdeng ▴ 510

Hello everyone,

I would like to download somatic SNP data from the TCGA. But if I have a look at the data matrix right here, there are two color codes "Tumor, matched normal" and "Normal, matched Tumor". I looked up the online guide and the Getting Started with the Data Matrix guide.

They explain it like

• TN (Tumor, matched normal) – Data for a tumor tissue for which matched normal tissue exists.
• NT (Normal, matched tumor) – Data for normal tissue for which matched tumor tissue exists.

But where is the difference?

May some of you are more experienced using TCGA then me.

With all the best,

Mario

tcga somatic snp • 20k views
0
Entering edit mode

@vchris_ngs,

I have not yet fully understood how to separate T/N samples, but it seems there is identifier in the BAM file name. I have not had time to go through, will look at in detail, after i finish the analysis on our data.

0
Entering edit mode

I have got a fair bit of understanding of the data types in the TCGA. I have however not found relevant expression dataset from tcga as per my needs. I would like to suggest you to go through breast cancer datasets in TCGA. There in the RNASeq and RNASeqV2 you have both TN and NT data, which means you will get expression data for both tumor for which normal exists and normal for which tumor exists. So you can download both the formats . One will be color coded in blue (TN type) and other yellow(NT type). Then you can make a filtering to get the matched pairs from the same patient locally and make your cohort.  Also if you are not looking for exact match then also you can make the analysis by just downloading randomly TN(blue coded) data and take similar number of NT(yellow coded) data and perform your analysis.

11
Entering edit mode
7.3 years ago

There is no difference between TN and NT for somatic mutations, because tumors and normals are paired up for somatic variant calling. It only makes sense for data generated separately for tumors/normals... like RNA-seq or methylation assays. The data matrix doesn't have a very intuitive interface because they tried to generalize the filtering UI for different data types.

I'd recommend downloading the somatic mutations (MAF files) directly from the TCGA DCC or from Firehose. Read more about TCGA MAF files in Working with MAF files (Mutation Annotation Format) from the TCGA (The Cancer Genome Atlas).

You can also use the firehose_get script to download latest available MAFs that are fed into Firehose. Here's how to do that:

wget http://gdac.broadinstitute.org/runs/code/firehose_get_latest.zip
unzip firehose_get_latest.zip
./firehose_get -b -only Mutation_Packager_Calls data latest

To list the other kinds of data (like Mutation_Packager_Calls in the command above) look them up here.

3
Entering edit mode

Yes, thank you. This should be written down somewhere within TCGAs wiki.

5
Entering edit mode
8.0 years ago
Ashwin ▴ 70

Hi Mario,

I know this is a bit confusing but not very complicated.

Firstly, check the TCGA primer on TCGA barcode logic - https://wiki.nci.nih.gov/display/TCGA/TCGA+barcode

TN - these will be your tumor samples when you download them using the data matrix, If you check the barcodes you will see that these samples will have a matched normal i.e normal tissue taken from the same patient

NT - these will be your normal samples

The sample ID is labelled in the link above its the 14-15th string in the TCGA barcode (including dash "-")

The sample ID segment in the TCGA barcode refers to these -

Tumor types range from 01 - 09 Normal types from 10 - 19 Control samples from 20 - 29.

Thus for all TN you will see a sample number range from 01-09 and the matched normals will be 10 - 19

Regards

Ashwin

0
Entering edit mode

Hello Ashwin, thank you for your answer. So, I don't get the pure somatic SNPs, just the called pairs, right?

0
Entering edit mode

By pure SNP's if you mean the raw data then, this cannot be directly downloaded from the TCGA data portal as this data is restricted and you need permission to access this data. For most sequence based data you are allowed to download only the Level 2 or 3 data from TCGA

0
Entering edit mode

I know. by pure somatic SNPs I meant the preprocessed calculations on the SNPs to get the somatic SNPs. But it's just as you said, To work with somatic SNPs a pair of files needs to be downloaded. Thank you.

0
Entering edit mode

Hey Ashwin,

I have been trying to download RNA-seq (any cancer) data from TCGA, for which there is Tumor/Normal pair, such that i want to compare the gene expression between T/N. I used tablematrix to download but i have not been able to find data for T/N pair, as i only found single value, and not knowing if that is from N or T. If you could help with details about "How-to", that would be very great. Thanks !

0
Entering edit mode

Did you understand how to break the data of TCGA as match TN pairs for RNA-Seq? I am just interested in level 3 raw count data for ovarian cancer samples. I want to download for a large cohort of patients for matched normal-tumor pair. I have downloaded TN category for RNA-Seq data but I do not understand which is for normal sample and which is for tumor. Did you get a clear understanding?