Minor Allele Frequency Nucleotide In Dbsnp?
8.6 years ago
mylons ▴ 130

I have a vcf of dbsnp-137, and it occasionally has a GMAF info field with an allele frequency. What I don't understand is how to know which allele, ref or alt, is the minor? I've been looking at several of these by hand via ncbi, and want to know how NBCI is determining that the T is the minor allele?

http://www.ncbi.nlm.nih.gov/snp/?term=rs819972

from dbsnp the Alt is listed as a C.

chr1 1407372 rs188574574 T C

Check Ensembl genome browser. It gives MAF from 1000 genomes project.

HI I have sequenced a gene and calcualte allele frequencies of each SNP. Now I want to compare the allele frequencies with Global minor allele frequencies data (whcih I have for each allele/SNP). I need your guidence How could I do it?

8.5 years ago
deanna.church ★ 1.1k

dbSNP uses the global 1000 G data to determine the minor allele (and frequencies). I know it may be hard to find, but it is noted on the RefSNP page (in the Allele column at the top of the page) here: http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=819972

Looking at the 1000G data (http://www.ncbi.nlm.nih.gov/variation/tools/1000genomes/?chr=NC_000001.10&from=1417504&to=1418504&mk=1418004:1418004|rs819972>s=rs819972) , it looks like C is the minor allele at about 37%. The data on the RefSNP page and in 1000 Genomes is consistent. If the VCF is reporting that T is the minor allele that is a bug and should be reported to info@ncbi.nlm.nih.gov. But, please note, there many sites for which the reference base is the minor allele.

8.5 years ago

Never confuse alternative with minor. The reference is simply the allele from the reference sequence, ie the individual sequenced. If that individual has the minor allele at that locus, then the minor allele will be the reference.

8.5 years ago

there's no way of knowing which of the alleles is the minor one unless it is specified in that vcf file. if for instance you look to this galaxy's dbsnp137 vcf file, which I'm sure that will look similar to yours, you will notice that there are some variants with that GMAF value, but there is no information about which one is the minor and which one the major. it could be the reference, it could be the alternative allele, but you really can't tell. this GMAF information may be useful for programs that deal with MAF as an independent value, and work with it not caring about which allele has that particular frequency because the algorithms they use simply don't need it. it's really a shame not having it specified in the file, because lots of counts and many data inferences could be performed on that vcf file if we knew to which allele does the MAF value correspond.