Why exactly do I need to fix ref allele using bcftools ?? [Genotype data]
1
0
Entering edit mode
4 months ago

Hi guys,

Sorry if this is a basic question... but I've searched all around the internet and can't found any answer to my question.

I am working on a Genotype data from illumina genotyping chip (so SNPs information of ~100 individuals), I've got a VCF file (converted from PLINK file)

I'm following a pipeline, and the pipeline goes through a step where it uses bcftools +fixref to "Fix REF allele according to GRCh37"

bcftools +fixref test.bcf -Ob -o output.bcf -- -f ref.fa -m top

The problem is that I don't understand what's the importance of doing this?

the bcftools manual states (regarding the above code): "If the output shows that the VCF is TOP-compatible, the following command can be used to fix the strand"

---> But what needs fixing?? considering that I have converted all my SNPs into the positive strand, I simply don't know what this code does and why is it important

Note: Technically, I can just blindly follow the pipeline without understanding what it is doing, but I'm really trying to understand what I'm doing here, so any helps are appreciated :)

Here's the output of the code:

enter image description here

VCF Genotype reference SNP bcftools • 267 views
ADD COMMENT
1
Entering edit mode
4 months ago

What is the pipeline ultimately doing? If you're proceeding to imputation, then that requires that the correct alleles are coded as 0 and 1 in the data, so that the relationship between variants can be correctly applied in imputation.

ADD COMMENT

Login before adding your answer.

Traffic: 1523 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6