TCGA germline and somatic snvs
Entering edit mode
4 months ago
SUMIT ▴ 30

I download the TCGA Masked SNVs using link . TCGA website claims that

"Somatic MAFs (*somatic.maf), which are also known as Masked Somatic Mutation files, are further processed to remove lower quality and potential germline variants. For tumor samples that contain variants from multiple combinations of tumor-normal aliquot pairs, only one pair is selected in the Somatic MAF based on their sample type. " at here .

It means that they filtered out the germline nucleotide variations and now Masked nucleotide variations file only contain somatic SNVs. However, after downloading the metadata (please see the picture) I found that many normal samples are leballed as tumor in another column. After verification of individual file name from gdc portal, I found that only "cases.0.samples.0.sample_type" (column 1) and "cases.0.samples.0.tissue_type" (column 2) match with the "file_name" (column 6) so what are the column 2-4 mainly?

Another question is that if all files labeled as masked SNVs then blood-derived normal expected to have 0 SNVs but it also contains NVs (nucleotide variation). Can anyone help me if know?

Any help would be much appreciated! Thanks Metadata file

TCGA SNVs gdc • 219 views
Entering edit mode

Well, you are talking about tumor-normal paired variant calling, and these MAF files are expected to link to both tumor sample and normal sample, because both are parent samples to derive such calls. That's why your have sample0 and sample1.

As to sample2, this is most likely due to the fact some projects (as in this case CPTAC-3) also generated proteomics data. Their proteomics platforms require a lot of material so sometimes they combine multiple samples into one aliquot. In particular, your first MAF is derived from a tumor aliquot and a normal aliquot; however the tumor aliquot is generated by combining 2 tumor samples. That's why you actually see 3 samples listed here.


Login before adding your answer.

Traffic: 1783 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6