Converting SNP datasets from A/B to ACGT format
2
0
Entering edit mode
8.6 years ago
User000 ▴ 690

Hello,

I have two sets of SNP data that were aligned, one in AGCT format and one in A/B format. For A/B format I know the alleles. However, there is difference in allele designation in two formats. Something like this:

marker1  chr  pos  alleles_set1  snp1_set1  snp2_set1  snp3_set1  alleles_set2  snp1_set2  snp2_set2  snp3_set2
m1       1    0    A/G           G          A          A          A/G           A          A          B
m2       1    0    A/G           A          A          G          T/C           A          B          B
m3       1    0    G/C           G          G          C          C/G           A          A          B

I need to produce a hapmap file to make association analyses.

So, my questions are: 1) How to change set2 from A/B format to AGCT? 2) When there is difference in alleles like in marker2, how to treat this data? Especially C/G and G/C?

SNP • 5.7k views
ADD COMMENT
1
Entering edit mode
8.6 years ago
User000 ▴ 690

OK, I solved the problem.

Apparently if the possible SNP variation is A, then it is assigned as Allele A, while C and G as Allele B. Similar to rules of complementarity, if the possible SNP variation is T, then it is assigned as Allele A, while C and G as Allele B. So in m1 A is allele A, and G is allele B. In m2 since this variations are complementary we can change T/C to A/G follow the rules above and merge sets. In m3 we need to know the TOP/BOT information of the allelic variations. The algorithm used in this cased if called "sequence walking".

ADD COMMENT
0
Entering edit mode

Hi, how did you solve the problem.

I have run a file on plink in AB format and now trying to do imputation on beagle 4. but beagle4 requires that i have ACGT coding instead of B. I would like to convert the AB to ACGT. your help will be highly appreciated

ADD REPLY
0
Entering edit mode

Hi, do you have an information about the alleles? if yes, you can apply the rules above but you will also need to know the TOP/BOT information of the allelic variations. I had to request an additional file from the company. Hope I was helpful. Here is a link you might find helpful as well: http://www.illumina.com/documents/products/technotes/technote_topbot.pdf

ADD REPLY
0
Entering edit mode

Hi sorry for late reply, i ended up using the old beagle version which accept the AB format. thanks for the file, I already had it

ADD REPLY
0
Entering edit mode

Hi Malomane, probably a question not related to the topic, but do you know for Beagle, if a parent has many offspring, how could we order / arrange the genotype input file for unphased trio data? Is it Male1, Female1, Off1, Male1, Female1, Off2 or simply as Male1, Female1, Off1, Off2, Off3..... ?? Thanks.

ADD REPLY
0
Entering edit mode
7.8 years ago

Hi every one

I have ped file with alleles coding as A B ,so I want to recode this ped file it as AGCT alleles . I used plink but not work with me to recode it to AGCT alleles. Could you please help here?

Thanks

Akil

ADD COMMENT
0
Entering edit mode

Hi, see the solution I posted above.

ADD REPLY

Login before adding your answer.

Traffic: 3458 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6