I download the TCGA Masked SNVs using link . TCGA website claims that
"Somatic MAFs (*somatic.maf), which are also known as Masked Somatic Mutation files, are further processed to remove lower quality and potential germline variants. For tumor samples that contain variants from multiple combinations of tumor-normal aliquot pairs, only one pair is selected in the Somatic MAF based on their sample type. " at here .
It means that they filtered out the germline nucleotide variations and now Masked nucleotide variations file only contain somatic SNVs. However, after downloading the metadata (please see the picture) I found that many normal samples are leballed as tumor in another column. After verification of individual file name from gdc portal, I found that only "cases.0.samples.0.sample_type" (column 1) and "cases.0.samples.0.tissue_type" (column 2) match with the "file_name" (column 6) so what are the column 2-4 mainly?
Another question is that if all files labeled as masked SNVs then blood-derived normal expected to have 0 SNVs but it also contains NVs (nucleotide variation). Can anyone help me if know?
Any help would be much appreciated! Thanks