Question

Why would you need to specify `--keep-allele-order` and `--a2-allele` when writing a VCF with PLINK?

2

Entering edit mode

4.4 years ago

curious ▴ 750

My friend is using plink 1.9, this is a simplified version of what I am reading from someones elses' code

He first reads in a vcf, split pseudoatuosomal regions

plink \
--vcf file.vcf \ 
--double-id \
--keep-allele-order \
--make-bed \
--split-x hg19 \
--out outname \

zeroes heterozygous haploid calls

plink 
--bfile outname \
--keep-allele-order \
--make-bed \
--set-hh-missing \
--out outname_hh_fixed

Merges the PAR back in

plink \
--bfile outname_hh_fixed \
--merge-x \
--make-bed \
--out outname_hh_fixed_merged

Converts the PLINK to VCF

plink \
--bfile outname_hh_fixed_merged \
--keep-allele-order \
--recode vcf-iid \
--real-ref-alleles \
--a2-allele vcfinfo.txt 4 3 0 \
--out final_vcf.vcf

If I understand --keep-allele-order prevents plink from flipping around alleles and preserves the REF allele in the original VCF as A2 (major allele in plink)

I have two questions:

Why is --keep-allele-order not needed when merging PAR back in?
Why does PLINK need you specify --keep-allele-order and --a2-allele when writing the VCF in the last step? I think I understand from the documentation that column 4 of vcfinfo.txt will be used as a guide for setting alleles in that column to A2 (or REF in the final_vcf). I am just kind of stumped as to why you would need to supply a column of alleles to designate as A2 if we are already saying --keep-allele-order. Wouldn't the correct order already be baked into the .bim file then? Thank you.

plink vcf • 6.7k views

ADD COMMENT • link updated 2.4 years ago by Shicheng Guo ★ 9.4k • written 4.4 years ago by curious ▴ 750

0

Entering edit mode

Thank you for the response. The goal is to preserve allele order throughout this pipeline. Is there a general reason behind why someone would want to let plink assign major/minor alleles when using merge-x to merge PAR back in? My thought here is that he was really trying to make sure that allele order was kept at the lest step and just let it go at the merging step.

ADD REPLY • link 4.4 years ago by curious ▴ 750

score 5 · Answer 1 · 2019-12-18

5

Entering edit mode

4.4 years ago

chrchang523 10k

It's okay for the correct REF/ALT allele ordering to be "forgotten" in intermediate steps, as long as the information is recovered in a later step by --a2-allele.
Every time you run plink 1.x, it'll try to reorder the alleles to A1=minor, A2=major unless you specify --keep-allele-order. (This looked like a reasonable idea at the time plink 1.0 was written...) So, because --keep-allele-order was omitted from the third step, the original REF/ALT ordering is lost there, and it's necessary to use --a2-allele to recover it when exporting the VCF. With that said, --keep-allele-order isn't needed in the last command, due to the order of operations (http://www.cog-genomics.org/plink/1.9/order ); the usual allele reordering happens before, not after, --a2-allele executes.

ADD COMMENT • link 4.4 years ago by chrchang523 10k

0

Entering edit mode

Thank you for the response. The goal is to preserve allele order throughout this pipeline. Is there a general reason behind why someone would want to let plink assign major/minor alleles when using merge-x to merge PAR back in? My thought here is that he was really trying to make sure that allele order was kept at the lest step and just let it go at the merging step.

ADD REPLY • link 4.4 years ago by curious ▴ 750

1

Entering edit mode

No, there’s no specific reason to leave —keep-allele-order out in step 3. But including —a2-allele once is usually easier than remembering —keep-allele-order everywhere.

ADD REPLY • link 4.4 years ago by chrchang523 10k

0

Entering edit mode

Thats what it looked like to me too, your responses helped me understand this greatly. You are an author of PLINK 2?

ADD REPLY • link 4.4 years ago by curious ▴ 750

0

Entering edit mode

He is author of all PLINK versions including and post v1.9. Possibly contributed to the original PLINK versions, too.

ADD REPLY • link 3.3 years ago by Kevin Blighe 87k

0

Entering edit mode

--a2-allele <uncompressed VCF filename> 4 3 '#'

Can you share more information about the last "#" in the --a2-allele option?

ADD REPLY • link 2.4 years ago by Shicheng Guo ★ 9.4k

1

Entering edit mode

https://www.cog-genomics.org/plink/1.9/data#ax_allele

https://www.cog-genomics.org/plink/1.9/data#update_map

"The optional fourth 'skip' parameter is either a nonnegative integer, in which case it indicates the number of lines to skip at the top of the file, or a single nonnumeric character, which causes each line with that leading character to be skipped." This is useful because all VCF header lines start with '#'.