plink2 reorder alleles by fasta reference
1
0
Entering edit mode
4 months ago

Hello,

I have a plink 2.0 file ("Ref" is always the major allele) and want to reorder its ref and alt alleles by a fasta reference. For this I use the --ref-from-fa command. However, the following error occurs:

Error: --ref-from-fa wants to change reference allele assignment at X:2700157,but it's marked as 'known'. Add the 'force' modifier to force this change


What does marked as "known" mean? How should I go on?

Best,

Andreas

0
Entering edit mode
4 months ago

plink 2.0 marks a reference allele as "provisional" instead of "known" when it comes from a plink 1 .bed file, or a VCF directly generated by plink from such a .bed file. Some of these reference alleles are expected to be wrong.

However, when a reference allele comes from a regular VCF file, it's expected to be correct; that's why an additional --ref-from-fa modifier is required to change it.

0
Entering edit mode

Hmmm. My data does not come from a "regular vcf" file, but from a standard GWAS-microarray workflow with PLINK. How would you proceed? --> "Force" it?

0
Entering edit mode

Yes, if you know the supposed REF alleles are really just major alleles, that's what "force" is for.

0
Entering edit mode

If I do so, I get the following report:

--ref-from-fa force: 4685 variants changed, 1 validated.


In total I only have 4686 variants...

The fasta reference which I use is from here: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz

And here is my command:

plink2 --pfile file --ref-from_fasta --fa reference.fasta --make-pgen --out output


What could be wrong here?

0
Entering edit mode

Perhaps your original dataset had "backwards" alleles (with REF usually A1 instead of A2) for some reason, while still being correctly encoded. You can sanity-check this by running --freq on your updated dataset: if the alternate allele frequencies are usually low, you're probably fine.