Flipping every REF/ALT and corresponsing genotypes of every single indel in a VCF.
1
1
Entering edit mode
3.9 years ago
curious ▴ 750

I am using Eage to phase plink input. I maintain --keep-allele-order throughout my workflow. Eagle outputs .haps (Oxford phased haplotype file), which I convert to VCF with shapeit. I did some sanity checks and find that this process reverses REF/ALT designation and genotypes for every single one of my snps and indels. Comparing alt allele frequency to reference strongly suggests that I do not have a build issue, so the workaround was initially to do

bcftools norm --check-ref -s -f hg19.fa input.vcf > output.vcf

to flip everything back after converting to vcf with shapeit, which perfectly flips every single one of my snps, but does not seem to do the same for indels. Is there a way to do this for indels as well. My alternative thought is just to use bcf as input for eagle, which should avoid the Plink/Oxford formats completely.

bcftools oxford plink • 4.2k views
ADD COMMENT
3
Entering edit mode
3.9 years ago

You should be using plink 2.0 instead of 1.9 as much as possible whenever allele order matters. Otherwise, if you forget --keep-allele-order even once, you're in trouble; and there are some plink 1.9 commands (like --linear/--logistic with interaction terms) which need to NOT be run with --keep-allele-order.

You can use plink2 --ref-allele to reset the REF alleles in the manner you want; see in particular the second bullet point in the documentation.

And as for every variant having REF/ALT reversed, that sounds like you did not import the initial file correctly. Recent plink 2.0 builds make that particular mistake a lot less likely by forcing you to declare whether an Oxford-format file has REF first or last.

ADD COMMENT
0
Entering edit mode

First off thank you.

I don't really get what you mean by import initial file incorrectly. I just checked again by converting my plink files I use as eagle input to VCF format by using:

plink \
--bfile {eagle_input_plink} \
--keep-allele-order \
--recode vcf-iid \
--out {eagle_input_vcf}

I checked REF/ALT for every position in the resulting {eagle_input_vcf} compared to my very first VCF in my workflow (before ever going into plink). These REF/ALT all are 100% the order they should be, so I don't think I forgot --keep-allele-order, although point taken.

I used the {eagle_input_plink} files to phase with Eagle, which outputs Oxford format. I convert Oxford format to VCF using:

shapeit -convert \
        --input-haps {eagle_oxford_output} \
        --output-vcf {eagle_phased.vcf}

From sanity checks I can tell that 100% of the REF/ALF alleles are flipped between {eagle_phased.vcf} and the test file {eagle_input_vcf} I made above. Since the flip was perfectly systematic, I wasn't too worried about using bcftools norm --check-ref -s to flip everything back, but now I wonder if there is something deeper that I am missing. Do you have any recommendations?

The only other thing I can think of is that either eagle or shapeit are making assumptions about the order of ref/alt in the input, which is sort of out of my control.

ADD REPLY
0
Entering edit mode

I alluded to the fact that Oxford format does not define whether REF is first or last. It looks like Eagle and Shapeit make different assumptions from each other.

ADD REPLY
0
Entering edit mode

Thanks, that was only something I was starting to think about after you made that comment. I am fairly niave about oxford format. I think I might end up just making a script to flip all the indels. I usually prefer the high level tool to avoid unintended corruption, but I might not have an option here.

ADD REPLY

Login before adding your answer.

Traffic: 2014 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6