plink2 reorder alleles by fasta reference
4 months ago

Hello,

I have a plink 2.0 file ("Ref" is always the major allele) and want to reorder its ref and alt alleles by a fasta reference. For this I use the --ref-from-fa command. However, the following error occurs:

Error: --ref-from-fa wants to change reference allele assignment at X:2700157,but it's marked as 'known'. Add the 'force' modifier to force this change


What does marked as "known" mean? How should I go on?

Best,

Andreas

4 months ago

plink 2.0 marks a reference allele as "provisional" instead of "known" when it comes from a plink 1 .bed file, or a VCF directly generated by plink from such a .bed file. Some of these reference alleles are expected to be wrong.

However, when a reference allele comes from a regular VCF file, it's expected to be correct; that's why an additional --ref-from-fa modifier is required to change it.

Hmmm. My data does not come from a "regular vcf" file, but from a standard GWAS-microarray workflow with PLINK. How would you proceed? --> "Force" it?

Yes, if you know the supposed REF alleles are really just major alleles, that's what "force" is for.

If I do so, I get the following report:

--ref-from-fa force: 4685 variants changed, 1 validated.


In total I only have 4686 variants...

The fasta reference which I use is from here: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz

And here is my command:

plink2 --pfile file --ref-from_fasta --fa reference.fasta --make-pgen --out output


What could be wrong here?

Perhaps your original dataset had "backwards" alleles (with REF usually A1 instead of A2) for some reason, while still being correctly encoded. You can sanity-check this by running --freq on your updated dataset: if the alternate allele frequencies are usually low, you're probably fine.