Question: How To Find Major And Minor Alleles From Dbsnp ?
gravatar for thecuriousbiologist
8.0 years ago by
United States
thecuriousbiologist480 wrote:

I am new to SNPs and the dbSNP database. I am not quite sure if I understand how to find major and minor alleles for a particular position from the database.

Let's take for example rs16532.

Initially, the "Allele" section says that "RefSNP Allele" is C/G. Next, the "Ancestral Allele" is G. Does this mean that the reference allele is G ?

Now we come to the tables on that page. Which of the tables should I refer to, to find major and minor alleles for this position ? There's a "Population Diversity" section which has some statistics. Should I be referring to this table ? Also, how should I find the strand information for the reference as well as the major and minor alleles ?

Also, some more basic questions :

Are major and minor alleles ALWAYS alternate alleles ? Can a minor allele be a reference allele ?

Any help will be greatly appreciated.

allele dbsnp • 28k views
ADD COMMENTlink modified 5.9 years ago by parajulirp0 • written 8.0 years ago by thecuriousbiologist480

Hi Expert Peer/seniors 

Valuable info,

highly appreciated!

Thank you

ADD REPLYlink modified 5.9 years ago • written 5.9 years ago by parajulirp0

Hello seniors,

Excellent clarification!

Thank you

ADD REPLYlink written 4.4 years ago by ralbreiki70
gravatar for Jorge Amigo
8.0 years ago by
Jorge Amigo12k
Santiago de Compostela, Spain
Jorge Amigo12k wrote:

there are many things we could discuss about this question, but you should definitely go for some basic reading before asking. the first one I would recommend would be dbSNP's self description on its own web page, as it answers all your doubts. I would go anyway through them to address your question:

  1. when you see in dbSNP a C/G it just means that dbSNP has recorded 2 alleles, C and G, for that particular position. the order C/G or G/C will be determined by the RefSeq strand just by convention, which is the strand that was sequenced for the reference genome used, with no critical reason to do so (my guess is that it was due to historical reasons). it can be the case that the SNP was typed on the other strand, so there's a field indicating whether the SNP is on the same strand as the reference strand or on the other.

  2. the ancestral allele is the one the chimp has for that particular position, after having aligned chimp's and human's genomes. we don't have ancestral alleles for all our genome, but it's still useful for comparisons due to evolutionary reasons as you may understand. other databases, such as ensembl, also report macaque's alleles for that kind of usage.

  3. to find minor alleles and minor allele frequencies (MAF) you may look to the SNP summary at the top of each SNP page, where a "MAF/MinorAlleleCount" field is reported in case dbSNP has population information for that SNP, giving you the overall frequency and count in all populations. you will have then the allele which is considered as minor due to its frequency, so for the rs16532 example you can see that the C allele is the minor one with 0.427 MAF, which means that the G would be the major allele with 1-0.427=0.573. if you want a deeper population description of the SNP you will have then to scroll down to the "Population Diversity" table, where all this previous information is splitted by populations. you may sometimes find out that a SNP that is minor for a population can be major for other (i.e. C allele is minor on 1000 genomes' pilot 1 YRI, but is major on 1000 genome's pilot 1 CHB+JPT for the rs16532 example), so this is the kind of information you'll have to deal with.

  4. the alternate allele is just the one not on the strand that was sequenced of the reference genome. since that selection is arbitrary, just a reference used to normalize the data reporting, the answers to your last questions would be the folloing:

    a) NO, major and minor alleles are not always alternate alleles. sometimes they will, sometimes they won't.

    b) YES, a minor allele can be the reference allele. sometimes it will, sometimes it won't.

ADD COMMENTlink written 8.0 years ago by Jorge Amigo12k

Thank you for your thorough explanation, Jorge — Are the populations and MAFs per population in the Population Diversity table those corresponding to the 1000 Genomes project, or has this table been updated to include ExAC populations and MAFs?

ADD REPLYlink written 2.3 years ago by gaelgarcia05210
gravatar for dfornika
8.0 years ago by
Vancouver, British Columbia, Canada
dfornika1.0k wrote:

Scroll down to the 'Population Diversity' section to find allele frequencies.

One very important thing to understand is that allele frequencies must be interpreted within the context of a specific population. An allele will have a different frequency in Salt Lake City, Utah than in Tokyo, Japan. And even within a geographic location, the frequency will vary depending on the ethnic background of your population of interest.

ADD COMMENTlink written 8.0 years ago by dfornika1.0k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1195 users visited in the last hour