Hi! Guys, I want to do some association researches between SNP genotypes and phenotypes in cancer patients, just like this article for investigating association between SNPs in UGT2B and breast cancer.
I know SNP information is located in somatic mutation, while I have downloaded somatic mutation MAF files, there are 5 subset files. I leave them as here.
-
BCGSC__IlluminaHiSeq_DNASeq_automated
-
BCM__IlluminaGA_DNASeq_automated
-
BCM__Mixed_DNASeq_curated
-
BI__IlluminaGA_DNASeq_automated
-
UCSC__IlluminaGA_DNASeq_automated
Actually these files are all somatic mutation files, but sequenced by different institutions and platforms. There are also some differences between these files, for example, here is a part of BCGSC__IlluminaHiSeq_DNASeq_automated file(11,12 13column), the genotypes of SNPs in Tumor_Seq_Allele are all zygosity.
Reference_Allele Tumor_Seq_Allele1 Tumor_Seq_Allele2 G A A G A A A G G A T T C A A C T T A G G ....
But in BI__IlluminaGA_DNASeq_automated file, the mutations are all heterozygous.
Reference_Allele Tumor_Seq_Allele1 Tumor_Seq_Allele2 G G A G G A A A T C C T A A G A A G G G A ....
Although having read the Tutorial Working with MAF files from the TCGA, I still have no idea which file to choose.
To conclude, I have two major problems troubled me.
Firstly,can Tumor_Seq_Allele 1 and Tumor_Seq_Allele 2 really represent the SNP's genotype? Because in the next step , I will select TagSNPs based on MAF(Minor Allele Frequency, filter criterion >0.05),but if I acquire genotyps like that, all the SNPs' frequency are below 0.05, which means no TagSNPs! I'm not sure whether it is right, this question also metioned the Tumor_Seq_Allele, but I don't understand REF/ALT allele and how to get a more reliable SNP genotype.
Secondly, the great disparities between BCGSC and BI files make me confused which file to choose for my next step.
I hope you guys can give me some suggestions. Many thanks!
Please see this comment about zygosity and the answer to it - C: Working With Maf Files (Mutation Annotation Format) From The Tcga (The Cancer Ge