How to get SNP genotypes in somatic mutation files with MAF(Mutation Annotation Format) ?
1
0
Entering edit mode
8.8 years ago
cying ▴ 10

Hi! Guys, I want to do some association researches between SNP genotypes and phenotypes in cancer patients, just like this article for investigating association between SNPs in UGT2B and breast cancer.

I know SNP information is located in somatic mutation, while I have downloaded somatic mutation MAF files, there are 5 subset files. I leave them as here.

  1. BCGSC__IlluminaHiSeq_DNASeq_automated
  2. BCM__IlluminaGA_DNASeq_automated
  3. BCM__Mixed_DNASeq_curated
  4. BI__IlluminaGA_DNASeq_automated
  5. UCSC__IlluminaGA_DNASeq_automated

Actually these files are all somatic mutation files, but sequenced by different institutions and platforms. There are also some differences between these files, for example, here is a part of BCGSC__IlluminaHiSeq_DNASeq_automated file (11,12 13column), the genotypes of SNPs in Tumor_Seq_Allele are all zygosity.

Reference_Allele    Tumor_Seq_Allele1    Tumor_Seq_Allele2
G    A    A
G    A    A
A    G    G
A    T    T
C    A    A
C    T    T
A    G    G
..

But in BI__IlluminaGA_DNASeq_automated file, the mutations are all heterozygous.

Reference_Allele    Tumor_Seq_Allele1    Tumor_Seq_Allele2
G    G    A
G    G    A
A    A    T
C    C    T
A    A    G
A    A    G
G    G    A
..

Although having read the Tutorial Working with MAF files from the TCGA, I still have no idea which file to choose.

To conclude, I have two major problems troubled me.

Firstly, can Tumor_Seq_Allele 1 and Tumor_Seq_Allele 2 really represent the SNP's genotype? Because in the next step , I will select TagSNPs based on MAF(Minor Allele Frequency, filter criterion >0.05), but if I acquire genotypes like that, all the SNPs' frequency are below 0.05, which means no TagSNPs! I'm not sure whether it is right, this question also mentioned the Tumor_Seq_Allele, but I don't understand REF/ALT allele and how to get a more reliable SNP genotype. Secondly, the great disparities between BCGSC and BI files make me confused which file to choose for my next step.

I hope you guys can give me some suggestions. Many thanks!

MAF TCGA SNP Genotype • 4.9k views
ADD COMMENT
1
Entering edit mode

Please see this comment about zygosity and the answer to it - Working with MAF files (Mutation Annotation Format) from the TCGA (The Cancer Genome Atlas)

ADD REPLY
0
Entering edit mode

I have made supplement about my questions, thank you!

ADD REPLY
0
Entering edit mode

Hello, I am resently confused with the actual meaning of Tumor-Seq-Allele1 and Allele2 nowdays, Did you solve it?

ADD REPLY
0
Entering edit mode
16 months ago
Zhenyu Zhang ★ 1.2k

You are not getting SNP information from MAF file. They are inconsistent in different MAF. Please don't try. -- suggestion from a current owner of MAF format

ADD COMMENT

Login before adding your answer.

Traffic: 3155 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6