Corrupted dbsnp-vcf?
1
0
Entering edit mode
7 weeks ago

Hello,

I have a vcf file, which I want to upload on the Sanger Imputation server. The following error occured:

--- Aborted Job ---
The input file sanity check failed, "bcftools norm -ce" exited with the following message:
Reference allele mismatch at X:3155141 .. REF_SEQ:'T' vs VCF:'G'

As suggested by the sanger website, I wanted to solve this issue with the bcftools +fixref command.

All my SNPs have dbsnp-IDs, so I downloaded the following file for reordering alleles: ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh37p13/VCF/All_20180423.vcf.gz

When I now use the

bcftools +fixref broken.vcf -O z -o fixref.vcf -- -d -f /path/to/reference.fasta -i `All_20151104.vcf.gz`

command, the following error appears:

[E::bgzf_uncompress] Inflate operation failed: invalid distance too far back
[E::bgzf_read_block] Invalid BGZF header at offset 15203091877

It seems, that the All_20151104.vcf.gz file is corrupted. I also am not able to index it with bcftools. However, another operation (subsetting it to regions) works...

Does anyone know, how to solve this problem?

Best,

Andreas

BCFTOOLS VCF Imputation • 184 views
ADD COMMENT
0
Entering edit mode

hg19: chrX:3155141 is T

hg18: chrX:3155141 is G

aren't you mixing hg* builds ?

ADD REPLY
0
Entering edit mode

I think/hope not...everything should be hg 19... Might be a stupid question, but where can I quickly check this for some SNPs?

ADD REPLY
0
Entering edit mode
7 weeks ago

Ok I solved it! The problem was the following: During the preprocessing, I converted plink files to vcfs and assumed the first allele in the bim file to be the reference allele...however, it is not! I now solved it by using the --ref-from-fa command when making vcf files.

ADD COMMENT

Login before adding your answer.

Traffic: 1453 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6