PLINK: Not sure how to go about creating the "mylist.txt" file for the --update-alleles command
1
0
Entering edit mode
6.8 years ago
wondereye • 0

I am trying to recode alleles for my .bim file in PLINK , for example from A,B allele coding to A,C,G,T.

My file

1    rs101   0    4566795    A    B
1    rs102   0    4640902    B    A

PLINK requires that you use the --update-alleles command ./plink --bfile mydata --update-alleles mylist.txt --make-bed --out newfile with which the file mylist.txt contains five columns per row listing For example,

    rs101  A B   G T
rs102  A B   A C

My question is when creating the "mylist.txt" file, for the last two columns, How do you determine the new allele code for first allele and the new allele code for other allele ( A is no G, B is now T)?

PLINK command-line recodealleles • 3.9k views
4
Entering edit mode
6.8 years ago

You're the one who wanted to recode them. Put any letters in there you please! Maybe the first A/B can be A/T, and the second can become that novel nucleotide X/Y.

Plink uses AB for a reason, that is linkage screening uses binary markers and doesnt care about the sequence. Since most of the high frequency SNPs will be intergenic, it's not going to have amino acid changes anyway.

If you for some reason want to have the real sequence, you need to find out where the file came from in the first place, because each microarray technology will use a semirandom strand (AT vs TA at a het site). If dbSNP said that rs101 is A/G, we can't tell if your data file is referring to G/A or A/G, or even reverse strand T/C and C/T.

That's something that will be annotated by the microarray manufacturer, or maybe available through bioconductor and Arrayexpress annotations. Maybe NCBI GEO will have platform information definitions too. I did this a few years ago with Affymetrix, and was forced to make an account with them to get to the reference data about my particular chip type.

0
Entering edit mode

Thank you so much @karl.stamm. I'm in just learning how to use PLINK. I was hoping to find out if there were any base changes for the SNP but, I'm assuming I can look this up on dbSNP?

2
Entering edit mode
Yeah if the SNP id is good, you can see anything about it from dbsnp. Doesn't really matter how its coded, you'll know AB is het. BB should be hom-minor but since we done know which strand it is, it could be the major allele. Sometimes, particularly older data sets had it backwards, you'll find the minor allele frequency is more than 50% for those. Rarely the genome reference has the disease allele' and you get all healthy subjects showing an apparent mutation... Its kind of a mess that Plink doesn't worry about for inheritance and linkage analysis.