Hi! Guys, I want to do some association researches between SNP genotypes and phenotypes in cancer patients, just like this article for investigating association between SNPs in UGT2B and breast cancer.
I know SNP information is located in somatic mutation, while I have downloaded somatic mutation MAF files, there are 5 subset files. I leave them as here.
Actually these files are all somatic mutation files, but sequenced by different institutions and platforms. There are also some differences between these files, for example, here is a part of BCGSC__IlluminaHiSeq_DNASeq_automated file(11,12 13column), the genotypes of SNPs in Tumor_Seq_Allele are all zygosity.
Reference_Allele Tumor_Seq_Allele1 Tumor_Seq_Allele2 G A A G A A A G G A T T C A A C T T A G G ....
But in BI__IlluminaGA_DNASeq_automated file, the mutations are all heterozygous.
Reference_Allele Tumor_Seq_Allele1 Tumor_Seq_Allele2 G G A G G A A A T C C T A A G A A G G G A ....
Although having read the Tutorial Working with MAF files from the TCGA, I still have no idea which file to choose.
To conclude, I have two major problems troubled me.
Firstly,can Tumor_Seq_Allele 1 and Tumor_Seq_Allele 2 really represent the SNP's genotype? Because in the next step , I will select TagSNPs based on MAF(Minor Allele Frequency, filter criterion >0.05),but if I acquire genotyps like that, all the SNPs' frequency are below 0.05, which means no TagSNPs! I'm not sure whether it is right, this question also metioned the Tumor_Seq_Allele, but I don't understand REF/ALT allele and how to get a more reliable SNP genotype.
Secondly, the great disparities between BCGSC and BI files make me confused which file to choose for my next step.
I hope you guys can give me some suggestions. Many thanks!