Understanding TCGA somatic mutation files
1
0
Entering edit mode
4.0 years ago
j.lunger18 ▴ 30

Hi all.

I'm currently looking at mutation data from TCGA. I have downloaded .json files from the GDC portal and parsed the files to download specific subsets of files.

I'm rather confused because the primary tumor somatic mutation .vcf files from TCGA are in a different format than I'm used to. The headers are as follows:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  NORMAL  TUMOR

However, since I have indicated that I only want files from "primary tumor", what is this "NORMAL" column for?

I ask partly to understand, but partly because I want to eventually merge all the samples together downstream, and I'm not able to do so when all of the sample names look to be "NORMAL" and "TUMOR"...

TCGA SNPs vcf maf • 1.0k views
ADD COMMENT
0
Entering edit mode
4.0 years ago

The normal column indicates the presence (or not) of each variant in the matched normal sample for the patient whose VCF you are viewing.

Kevin

ADD COMMENT
0
Entering edit mode

Will variants be listed that are only in NORMAL? Or does this entirely depend on the tool that was used for variant calling? I'm interested in looking at sites only, and I want the variants to only be those found in tumors.

ADD REPLY
0
Entering edit mode

Will variants be listed that are only in NORMAL?

I do not believe this to be true; however, you can easily check, as such sites will have 0/0 for TUMOR.

ADD REPLY

Login before adding your answer.

Traffic: 2111 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6