Question: How to change a allels to illumina AB?
gravatar for MichaelTrev
9 days ago by
MichaelTrev10 wrote:

Hi, I got a file with three columns. One is an rs id, others are Allele1 and Allele2. Alleles are presented as nucleotides:


I need to create a file with AB Illumina format... so I need to convert AG, GC, GG to AB, AA or BB (depends). Can someone explain to me the best way to do that? And it's even possible having only that information what have I?

allels illumina ab • 107 views
ADD COMMENTlink modified 6 days ago by Charles Warden6.6k • written 9 days ago by MichaelTrev10

NB May 16, 2019 - although I discuss A and B alleles mostly in relation to major and minor allele in my comment (below), on the Illumina genotyping arrays, A and B relate to TOP and BOT (coding and non-coding strands).


It is not an easy feat because A and B alleles can mean different things in different contexts. The common interpretation is that A relates to the major allele, whereas B relates to the minor allele. This begs the question: in which cohort are these the major and minor alleles? - the usual reference for this is 1000 Genomes data, but it can also be your study cohort.

Most likely, there will be an annotation file available for the microarray platform that was used, which will [hopefully] contain information on which allele is A or B - ask your colleagues if they know anything about this. If they know nothing, determine the microarray platform that was used and search for the annotation file online.

My other suggestion to you: confirm with your colleagues why AB format is required, and confirm the results that are requested to be obtained. Do the results necessitate a conversion to AB format?

Finally, if all else fails, annotate each of your records for 1000 Genomes Phase III allele frequencies, and then set A and B alleles manually based on the allele frequencies for each (A = major; B = minor). This will take you a bit of extra work; however, it is feasible to do.


ADD REPLYlink modified 7 days ago • written 8 days ago by Kevin Blighe42k

Thanks for answers! I learn something about Illumina. AB format is needed coz database work in it. I got additional informations than all alleles in the doc are all TOP alleles. That change anything or I still need to use manifest file and try to deal with R? Best regards!

ADD REPLYlink modified 7 days ago • written 7 days ago by MichaelTrev10

In that case, your data is just A alleles. While the chip likely originally included A and B alleles, for downstream processing, sometimes we filter out all SNPs from one strand, i.e., those on the non-coding strand, as I mention in step 6, here: Produce PCA bi-plot for 1000 Genomes Phase III - Version 2

ADD REPLYlink written 7 days ago by Kevin Blighe42k
gravatar for bernatgel
8 days ago by
Barcelona, Spain
bernatgel1.9k wrote:

Hi @hektor102

I agree with Kevin on asking your colleagues why the AB notation is needed since it's less informative than the actual alleles, specially using Illumina definition of A and B.

If you are referring to the A and B alleles as defined by Illumina for their SNP-array technology, then the A and B designation has nothing to do with the population frequency and is defined solely based in the sequence context (this is what produces the symmetry in the BAF plots in SNP-arrays, see the top panel in this image. The AB definition is explained in this technical note.

enter image description here

To transform your alleles into AB alleles I think your best bet would be to get an Illumina manifest file (or a results file) for the exact platform used and use R or similar to match the snps by rs and then transform them to AB using the column in the manifest file. As an example, the manifest of Ilumina OmniExpress is at

Hope this helps


ADD COMMENTlink modified 8 days ago • written 8 days ago by bernatgel1.9k

Thanks for the clarification on how illumina defines AB alleles. Have moved my answer to a comment.

ADD REPLYlink written 8 days ago by Kevin Blighe42k
gravatar for Charles Warden
6 days ago by
Charles Warden6.6k
Duarte, CA
Charles Warden6.6k wrote:

I don't think I saw anybody else say it, but you can define those from GenomeStudio (if you have the .idats, or you can ask to get access to them).

If at all possible, that is what I would recommend. Otherwise, would agree with the feedback that you have received.

Best of luck with your project!

ADD COMMENTlink written 6 days ago by Charles Warden6.6k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1614 users visited in the last hour