I used bcftools fixref plugin to match reference allele with GRCh37. As the result shown below, I have got 18901 unresolved SNPs.
- Is there anyway to see the rsID of these unresolved SNPs?
- Why are there unresolved SNPs?
- What can I do to make these SNPs's reference allele match those in GRCh37?
Result of bcftools fixref
Thank you for any feedback.
In case you can't see the picture attached above, please go to https://ibb.co/nf1Sby
Have you not seen the blatant warning given in relation this the operation of this command?
Can you elaborate on where you obtained your ensembl.bcf flle or how it was produced? Was it even aligned to hg19 / GRCh37?
Hi Kevin. Thanks for your comments. I used this unsafe command because when I used the
command, the mismatch rate achieved 65%. I think this is because some SNPs in my genotype file are on the reserve strand.
I started to produce the bcf file from PLINK bfiles. I first used PLINK1.9 to recode my bfiles to vcf file, then compressed it to vcf.gz, renamed chromosome 23 -> X etc, and used
bcftools sort ensembl.vcf.gz -Ob -o ensembl.bcfand
bcftools index ensembl.bcfcommands to produce the bcf file and its index. I guess using
bcftools +fixref test.bcf -Ob -o output.bcf -- -f ref.fa -m flip -dcommand can help deal with the SNPs on reserve strand directly.
Update: I used snpflip.py (https://github.com/biocore-ntnu/snpflip) to see the rsID of the unresolved SNPs. It turns out the 18902 SNPs are ambiguous ones and seems it is okay to drop them.