How To Translate Encode Genotype Data From Aa/Ab/Bb To Standard A,T,G,C ?
1
1
Entering edit mode
10.5 years ago

Hi,

I downloaded ENCODE genotype calls from UCSC table browser. The genotypes were obtained with the Illumina 1M-Duo. But they come in the AA/AB/BB format.

I have been trying to understand how to convert them to A,C,G,T...

I guess there should be an allele table that would tell me what is the A and B allele for each SNP?

This is the file I downloaded.

Thanks!
Ines

encode illumina genotype • 8.4k views
ADD COMMENT
3
Entering edit mode
10.5 years ago

Indeed there should be a map file where the SNP codes should be described with their corresponding alleles. some programs may be able to work directly with this AA/AB/BB format, but in case the online resource doesn't provide that translation table (I have just checked it and it's quite strange they don't even provide it in the same folder) then you´ll definitely have to ask for it. or you could think about translating it yourself, by downloading from dbSNP each SNP's alleles and assigning A to the reference allele and B to the alternative allele. I would suggest first to contact the data providers and make sure about it, in order to avoid failing in this allele conversion process (reversed SNPs, triallelics,...).

ADD COMMENT
1
Entering edit mode

It can get a little more complicated than that because of strand issues. This isn't something specific to ENCODE, it is an Illumina format. Illumina has a PDF technote for part of this issue here.

ADD REPLY
0
Entering edit mode

if you know the Illumina chip that has been used to get those genotypes, you can always try to find that map file for the allele translation needed yourself in their website.

ADD REPLY
0
Entering edit mode

Hi! Actually, I found some file in the illumina website but they dont really say anything about allele A and allele B. They say something about TOP/BOT alleles. I was hoping someone would have made an R package or some kind of script to deal with this issue in a more straightforward way..

ADD REPLY
0
Entering edit mode

having the allele translations in a file it should be very simple to build a mapping variable (a hash in perl, for instance, like

$trans{$rscode."_A"} = $allele1
$trans{$rscode."_B"} = $allele2

which you would use to parse your data file). if you state here which Illumina file you're looking at, or even if you paste some example lines, it would be easier to give you further advice.

ADD REPLY
0
Entering edit mode

There is a relationship between Illumina's TOP/BOT designation and their AB designation. however I don't think it maps to dbSNPs top/bottom designation for a SNP. Is there any way to get the ENCODE data in another format? I am sure when they originally did the genotyping they should have been able to export data in both the A/B format and the raw genotypes from Illumina GenomeStudio. I'd be surprised if they didn't offer the dataset in the alternative format. Depending on what you are going to do with the data, you may find it simpler (if it is possible) to just work with it in the AB format. Many programs will accept it quite readily.

ADD REPLY

Login before adding your answer.

Traffic: 3182 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6