I'm running a germline variant calling pipeline with bcftools, and I annotate/filter variants based on allele frequency from gnomAD exomes (gnomad.exomes.r2.1.1.sites.vcf.bgz). After filtering, I get an error in IGV when loading the VCF:
Error loading features for interval: chr1:114940611-114940651 htsjdk.tribble.TribbleException$InternalCodecException: The allele with index 2 is not defined in the REF/ALT columns in the record
chr1 114940632 . C CA 115.328 PASS INDEL;IDV=32;...;AC=1,1;AN=2;... GT:PL 1/2:176,86,79,72,0,93 This genotype 1/2 suggests two alternate alleles, but only one ALT allele CA is listed.
Here are the key steps I use:
# 1. Variant calling
bcftools mpileup -Ou -f genome.fa sample.bam |
bcftools call -mv -Ou |
bcftools filter -s LowQual -e 'QUAL<30 || DP<10' -Ov -o sample_raw.vcf
# 2. Compress + index
bgzip sample_raw.vcf && tabix -p vcf sample_raw.vcf.gz
# 3. Annotate with gnomAD (AF)
bcftools annotate \
-a gnomad.exomes.r2.1.1.sites.vcf.bgz \
-c CHROM,POS,REF,ALT,INFO/AF \
sample_raw.vcf.gz -Oz -o sample_annotated.vcf.gz
tabix -p vcf sample_annotated.vcf.gz
# 4. Normalize (split multiallelics)
bcftools norm -m -any sample_annotated.vcf.gz -Oz -o sample_norm.vcf.gz
tabix -p vcf sample_norm.vcf.gz
# 5. Filter by AF and quality
bcftools view -f PASS -i 'INFO/AF<0.05' sample_norm.vcf.gz -Oz -o sample_filtered.vcf.gz
tabix -p vcf sample_filtered.vcf.gz
# 6. (Optionally) Normalize again?
bcftools norm -m -any sample_filtered.vcf.gz -Oz -o sample_final.vcf.gz
tabix -p vcf sample_final.vcf.gz
I ALREADY CHECK THIS: VCF headers are intact (#CHROM, INFO, FORMAT, etc.)
bcftools view doesn't complain about the file.
IGV loads unfiltered/raw VCFs without issues.
Manual checks show many sites with 1/2 genotypes but only one ALT allele.
Normalization (bcftools norm -m -any) doesn't seem to resolve the mismatch in some cases.
I dont know where is my mistake. Any advice is appreciated, especially from anyone who's successfully integrated gnomAD Exome annotations into a clinical/exome pipeline without IGV errors!
that would be a problem with
bcftools norm
. What is your version of bcftools ? Furthermore, show us the output ofI am using bcftools 1.19 Using htslib 1.19
the ouput
there was already a problem before bcftools nom . Do you still have two alleles in sample_raw.vcf.gz at the same position ? if true you should upgrade bcftools and if the problem persists you should submit a bug report.
by the way, when using bcftools annotate this way, you would overwrite the original AF field with the value of gnomad.
https://samtools.github.io/bcftools/bcftools.html#annotate
yes, but I think that I need it in this way because the main idea if filter by the population AF, so I dont need the AF of the sample