Why exactly do I need to fix ref allele using bcftools ?? [Genotype data]
1
0
Entering edit mode
4 months ago

Hi guys,

Sorry if this is a basic question... but I've searched all around the internet and can't found any answer to my question.

I am working on a Genotype data from illumina genotyping chip (so SNPs information of ~100 individuals), I've got a VCF file (converted from PLINK file)

I'm following a pipeline, and the pipeline goes through a step where it uses bcftools +fixref to "Fix REF allele according to GRCh37"

bcftools +fixref test.bcf -Ob -o output.bcf -- -f ref.fa -m top


The problem is that I don't understand what's the importance of doing this?

the bcftools manual states (regarding the above code): "If the output shows that the VCF is TOP-compatible, the following command can be used to fix the strand"

---> But what needs fixing?? considering that I have converted all my SNPs into the positive strand, I simply don't know what this code does and why is it important

Note: Technically, I can just blindly follow the pipeline without understanding what it is doing, but I'm really trying to understand what I'm doing here, so any helps are appreciated :)

Here's the output of the code:

VCF Genotype reference SNP bcftools • 267 views
1
Entering edit mode
4 months ago

What is the pipeline ultimately doing? If you're proceeding to imputation, then that requires that the correct alleles are coded as 0 and 1 in the data, so that the relationship between variants can be correctly applied in imputation.