First time to touch SNP array. I have a set of SNP array downloaded from TCGA. Now I am interested in the SNP loci and trying to do the eQTL analysis. Would you guys provide a tutorial for this kind of work? How to get the SNP loci information, downloading from where or get from the array data? How to select SNP loci with good allele frequencies (TCGA level 1/2/3?)? Thanks.

Or how to get allele frequency from TCGA data?

I typically only see copy number calls in the public TCGA data for SNPs. You might need to make a special request to access the actual SNP calls.

Either way, I think the links from this page provide the best organization of publicly available data:

You can also get some additional information from cBioPortal, but I don't know if this will help with your specific question:

Thanks for the info. So it is impossible to get minor allele frequency from copy number calls (level 3 data)? What is the best approach to select SNP loci? I mean, should I use the allele frequency from the cancer sample as a cut off or use the frequency from dsSNP database?

The copy number calls include no SNP-related information, including minor allele frequencies (which would be a population-level metric instead of individual-level metric anyways).

I am not certain about your goal, but the SNP array array should really be focusing on common variants. Rare variants (like those in the MAF file) should be coming mostly from the DNA-Seq data. You can use tools like ANNOVAR and/or SeattleSNP to characterize variants of interest and obtain population frequencies from 1000 genomes and ESP. With SNP data, you are typically interested in comparing variant frequencies between two groups.

