Can someone explain PLINK allele REF/ALT management strategy?
1
2
Entering edit mode
13 months ago

Sometimes when merging two plink files, the Reference (REF) and Alternative (ALT) alleles may be reversed, e.g. REF G ALT A versus REF A ALT G.

The main reason for that is the default action of PLINK. You see, when using PLINK with new binary file, it automatically assigns REF to the Minor allele (least frequent). So even if files was aligned to reference, after converting to plink format, it has been inadvertently changed.

The main issue however is that i do not understand why it changing only the .file notation of REF ALT, while .bed file listing each individual and their genotypes stays unchanged. In other words, if snp rs1234 for individual James Smith is 0/1, after PLINK default action it becomes 0/1. Yes, exactly the same, while the "1" has changed from being G to A.

All the options provided by PLINK to reverse REF/ALT does the same - changed the notion in .bim file, but does nothing to .ped (.bed) files.

Can you explain why is there such a behaviour? From my calculations it renders 30% of my .bim alleles incorrect after merge. Am i missing something?

plink • 1.6k views
ADD COMMENT
2
Entering edit mode
13 months ago
  1. A VCF "0/1" genotype is not ordered. It has the same meaning regardless of whether REF=A ALT=G or REF=G ALT=A.
  2. See https://www.cog-genomics.org/plink/1.9/data#ax_allele for discussion of plink 1.x's allele-swapping behavior. (Bottom line: you probably want to use plink 2.0 whenever possible when allele order matters.)
ADD COMMENT

Login before adding your answer.

Traffic: 1694 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6