Question: How To Translate Encode Genotype Data From Aa/Ab/Bb To Standard A,T,G,C ?
5.5 years ago
United Kingdom
wrote:

Hi, I downloaded ENCODE genotype calls from UCSC table browser. The genotypes were obtained with the Illumina 1M-Duo. But they come in the AA/AB/BB format.

I have been trying to understand how to convert them to A,C,G,T... I guess there should be an allele table that would tell me what is the A and B allele for each SNP?

This is the file I downloaded:

Thanks! Ines

illumina encode genotype
written 5.5 years ago by inesdesantiago160
5.5 years ago
Jorge Amigo
Santiago de Compostela, Spain
wrote:

indeed there should be a map file where the SNP codes should be described with their corresponding alleles. some programs may be able to work directly with this AA/AB/BB format, but in case the online resource doesn't provide that translation table (I have just checked it and it's quite strange they don't even provide it in the same folder) then you´ll definitely have to ask for it. or you could think about translating it yourself, by downloading from dbSNP each SNP's alleles and assigning A to the reference allele and B to the alternative allele. I would suggest first to contact the data providers and make sure about it, in order to avoid failing in this allele conversion process (reversed SNPs, triallelics,...).

written 5.5 years ago by Jorge Amigo

It can get a little more complicated than that because of strand issues. This isn't something specific to ENCODE, it is an Illumina format. Illumina has a PDF technote for part of this issue here:

written 5.5 years ago by Dan Gaston

if you know the Illumina chip that has been used to get those genotypes, you can always try to find that map file for the allele translation needed yourself in their website.

written 5.5 years ago by Jorge Amigo

Hi! Actually, I found some file in the illumina website but they dont really say anything about allele A and allele B. They say something about TOP/BOT alleles. I was hoping someone would have made an R package or some kind of script to deal with this issue in a more straightforward way..

written 5.5 years ago by inesdesantiago160

having the allele translations in a file it should be very simple to build a mapping variable (a hash in perl, for instance, like

$trans{$rscode."_A"} = $allele1
$trans{$rscode."_B"} = $allele2

which you would use to parse your data file). if you state here which Illumina file you're looking at, or even if you paste some example lines, it would be easier to give you further advice.

written 5.5 years ago by Jorge Amigo

There is a relationship between Illumina's TOP/BOT designation and their AB designation. however I don't think it maps to dbSNPs top/bottom designation for a SNP. Is there any way to get the ENCODE data in another format? I am sure when they originally did the genotyping they should have been able to export data in both the A/B format and the raw genotypes from Illumina GenomeStudio. I'd be surprised if they didn't offer the dataset in the alternative format. Depending on what you are going to do with the data, you may find it simpler (if it is possible) to just work with it in the AB format. Many programs will accept it quite readily.

written 5.5 years ago by Dan Gaston
