Question: How To Translate Encode Genotype Data From Aa/Ab/Bb To Standard A,T,G,C ?
gravatar for inesdesantiago
7.1 years ago by
United Kingdom
inesdesantiago170 wrote:

Hi, I downloaded ENCODE genotype calls from UCSC table browser. The genotypes were obtained with the Illumina 1M-Duo. But they come in the AA/AB/BB format.

I have been trying to understand how to convert them to A,C,G,T... I guess there should be an allele table that would tell me what is the A and B allele for each SNP?

This is the file I downloaded:

Thanks! Ines

illumina encode genotype • 6.4k views
ADD COMMENTlink modified 6.1 years ago by Biostar ♦♦ 20 • written 7.1 years ago by inesdesantiago170
gravatar for Jorge Amigo
7.1 years ago by
Jorge Amigo12k
Santiago de Compostela, Spain
Jorge Amigo12k wrote:

indeed there should be a map file where the SNP codes should be described with their corresponding alleles. some programs may be able to work directly with this AA/AB/BB format, but in case the online resource doesn't provide that translation table (I have just checked it and it's quite strange they don't even provide it in the same folder) then you´ll definitely have to ask for it. or you could think about translating it yourself, by downloading from dbSNP each SNP's alleles and assigning A to the reference allele and B to the alternative allele. I would suggest first to contact the data providers and make sure about it, in order to avoid failing in this allele conversion process (reversed SNPs, triallelics,...).

ADD COMMENTlink written 7.1 years ago by Jorge Amigo12k

It can get a little more complicated than that because of strand issues. This isn't something specific to ENCODE, it is an Illumina format. Illumina has a PDF technote for part of this issue here:

ADD REPLYlink written 7.1 years ago by DG7.2k

if you know the Illumina chip that has been used to get those genotypes, you can always try to find that map file for the allele translation needed yourself in their website.

ADD REPLYlink written 7.1 years ago by Jorge Amigo12k

Hi! Actually, I found some file in the illumina website but they dont really say anything about allele A and allele B. They say something about TOP/BOT alleles. I was hoping someone would have made an R package or some kind of script to deal with this issue in a more straightforward way..

ADD REPLYlink written 7.1 years ago by inesdesantiago170

having the allele translations in a file it should be very simple to build a mapping variable (a hash in perl, for instance, like

$trans{$rscode."_A"} = $allele1
$trans{$rscode."_B"} = $allele2

which you would use to parse your data file). if you state here which Illumina file you're looking at, or even if you paste some example lines, it would be easier to give you further advice.

ADD REPLYlink modified 7.1 years ago • written 7.1 years ago by Jorge Amigo12k

There is a relationship between Illumina's TOP/BOT designation and their AB designation. however I don't think it maps to dbSNPs top/bottom designation for a SNP. Is there any way to get the ENCODE data in another format? I am sure when they originally did the genotyping they should have been able to export data in both the A/B format and the raw genotypes from Illumina GenomeStudio. I'd be surprised if they didn't offer the dataset in the alternative format. Depending on what you are going to do with the data, you may find it simpler (if it is possible) to just work with it in the AB format. Many programs will accept it quite readily.

ADD REPLYlink written 7.1 years ago by DG7.2k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1019 users visited in the last hour