Question: Forcing A1/A2 alleles in PLINK in a VCF phased file
1
gravatar for GabrielMontenegro
2.1 years ago by
United Kingdom
GabrielMontenegro540 wrote:

I have a phased VCF file and I need to change the alleles of the SNPs so that the alleles coded as 0 represent the ancestral state while the alleles coded as 1 represent the derived state.

I got a list of ancestral and derive states for my SNPs in my VCF file and I want to change them using PLINK.

If I have a list of ancestral alleles in a file ancestral_snps.txt , if I use the following command will I get the wanted output?

plink --vcf myvcf.vcf --a1-allele ancestral_snps.txt --recode vcf --out myvcfancestral.vcf

Or is it --allele-a2 ?

I assume that this change won't affect the phasing of my data, right? As I basically only recoded the alleles.

EDIT:

Following Christopher Chang's comment:

If my ancestral_snps.txt file looks like this, specifying which allele is ancestral:

 C rs1001
 G rs1002
 A rs1003

If I use this command in PLINK2, will the ancestral alleles be coded as 0 in my phased VCF?

./plink2 --vcf myvcf.vcf --ref-allele force 1 2 ancestral_snps.txt --recode vcf --out myvcfancestral.vcf

snp plink vcf • 3.2k views
ADD COMMENTlink modified 12 months ago by diercles.cardoso0 • written 2.1 years ago by GabrielMontenegro540

The command ... --ref-allele force ancestral_snps.txt 1 2 ... do not change heterozygous, just homozygous. When I am changing all the reference alleles in the VCF it is perfect, because the phased haplotypes will be kept. However, if I intend to change the reference allele for a subset of markers in the VCF, this behaviour will destroy the phased haplotypes. To represent what I mean, I give you a short example where only the two first SNPs were changed:

Original file                            New file

REF     ALT    Sample1         |||    REF    ALT Sample1 

 T       G       1|0           |||     G     T     1|0 

 A       C       0|0           |||     C     A     1|1

 T       C       0|1           |||     T     C     0|1   

 T       C       1|1           |||     T     C      1|1

Am I missing any command to force changes in heterozygous ?

ADD REPLYlink modified 12 months ago by genomax73k • written 12 months ago by diercles.cardoso0

Hi,

Can you post the full .log file from your run?

ADD REPLYlink written 12 months ago by chrchang5235.7k

Sure.

PLINK v2.00a1LM AVX2 Intel (11 Feb 2018) Options in effect: --cow --export vcf --maf 0.01 --out teste --ref-allele force reference.teste2 1 2 --vcf CHR9.vcf.gz

Hostname: c070 Working directory: home/VCFs Start time: Thu Oct 11 15:58:01 2018

Random number seed: 1539233881 128719 MB RAM detected; reserving 64359 MB for main workspace. Using up to 20 threads (change this with --threads). --vcf: 1589271 variants scanned. --vcf: teste-temporary.pgen + teste-temporary.pvar + teste-temporary.psam written. 50 samples (0 females, 0 males, 50 ambiguous; 50 founders) loaded from teste-temporary.psam. 1589271 variants loaded from teste-temporary.pvar. Note: No phenotype data present. Calculating allele frequencies... done. 149348 variants removed due to minor allele threshold(s) (--maf/--max-maf/--mac/--max-mac). 1439923 variants remaining after main filters. --ref-allele: 14 sets of allele codes rotated. --export vcf to teste.vcf ... done.

End time: Thu Oct 11 15:58:12 2018

ADD REPLYlink modified 12 months ago • written 12 months ago by diercles.cardoso0

Can you retry this with a more recent build? This appears to be working properly on my end, and I recall fixing this several months ago.

ADD REPLYlink written 12 months ago by chrchang5235.7k

Thanks.

It worked perfectly when I used PLINK v2.00a2LM AVX2.

ADD REPLYlink modified 12 months ago • written 12 months ago by diercles.cardoso0
4
gravatar for chrchang523
2.1 years ago by
chrchang5235.7k
United States
chrchang5235.7k wrote:
  • If you want to preserve phase information, you'll need plink 2.0; this is impossible in 1.9 due to the limitations of the core file format.

  • "--ref-allele force ancestral_snps.txt" should work. Depending on the layout of the ancestral_snps file, you might need to specify some column numbers too; "plink2 --help ref-allele" provides usage details.

ADD COMMENTlink written 2.1 years ago by chrchang5235.7k

So if I understand correctly, the --ref-allele force will make the specified (in this case ancestral) alleles be changed to 0? I am checking the help page, but I am not sure what this means: {refcol} {IDcol} {skip} ? What column numbers would I need to specify? Is it from the VCF or the ancestral_snps.txt file? I made an edit to my original answer.

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by GabrielMontenegro540

refcol is the column number of the alleles you want to treat as reference; in this case, 1.

IDcol is the variant ID column number; in this case, 2.

ADD REPLYlink written 2.1 years ago by chrchang5235.7k

So then: ... --ref-allele force ancestral_snps.txt 1 2 .... Would this work? Thanks again.

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by GabrielMontenegro540
1

Yes, that should work.

ADD REPLYlink written 2.1 years ago by chrchang5235.7k

Thanks - that worked!

ADD REPLYlink written 2.1 years ago by GabrielMontenegro540

I just noticed that after using --ref-allele force some of the genotypes that were UNPHASED appear as PHASED in the new vcf file. Is there any way to avoid this from happening?

ADD REPLYlink written 3 months ago by GabrielMontenegro540
1

No, because it's irrelevant. You noted in your plink2-users post that this only affected homozygous genotypes, which don't have a phasing state at all. plink2 renders homozygous genotypes in VCFs with '|' iff the previous heterozygous genotype was phased.

(There are plans to add support for VCF "phase sets" in the future, but for now, you should stick to bcftools when it's important to preserve that information.)

ADD REPLYlink written 3 months ago by chrchang5235.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1908 users visited in the last hour