Hello,
I've been running into an error lately when I try to run bcftools stats but that I believe is probably an error produced in bcftools call.
I have a set of .bcf files generated with samtools mpileup. Original bams were mapped with BWA-mem, treated with AddOrReplaceReadGroups, MarkDuplicates and RealignerTargetCreator.
This is my call for the files; I run them inside a lop
samtools mpileup -Q 20 -q 20 -go %s_raw.bcf -f %s %s
where the second %s is a path to the reference and the third the name of the input bam.
Then I run bcftools call, again inside a loop:
cat chromosome.list | xargs -I {} -n 1 -P 20 sh -c 'bcftools call -cO z -r {} --format-fields GQ -o %s/%s.{}.vcf.gz %s'
where the call would look something like this:
cat chromosome.list | xargs -I {} -n 1 -P 20 sh -c 'bcftools call -cO z -r {} --format-fields GQ -o LER_ALM_1580_raw/LER_ALM_1580_raw.{}.vcf.gz LER_ALM_1580_raw.bcf'
then I concatenate all the chromosome files using bcftools concat.
Finally, I try to run bcftools stats, like so (example from one sample):
bcftools stats -F /scratch/3/msf/Reference/LER_ALM1580/LER_ALM1580.gatk.iteration4.consensus.FINAL.fa -s - LER_ALM_1580_raw.vcf.gz > LER_ALM_1580_raw_v2.vcf.gz.stats
And get this error:
[E::bcf_calc_ac] Incorrect allele ("1") in LER_ALM_1580 at 1:461
If I look it up with bcftools view:
bcftools view -r 1:461 LER_ALM_1580_raw.vcf.gz
This is what I see:
CHROM POS ID REF ALT QUAL FILTER INFO FORMAT LER_ALM_1580
1 461 . C . 3.53256 . DP=1;SGB=-0.379885;MQ0F=0;AF1=1;AC1=2;DP4=0,0,0,1;MQ=60;FQ=-29.9956 GT:PL:GQ 0/1:29:5
Which means that there is a discordance between the GT field and REF/ALT information.
I have gone back and tried different things with only one sample and one chromosome. I've realized that if I repeat the analysis without using format-fields, the output vcf file is different (REF/ALT and GT are concordant!):
bcftools call -cO z -r 1 -o LER_ALM_1580_raw_chr1.vcf.gz LER_ALM_1580_raw.bcf
bcftools index LER_ALM_1580_raw_chr1.vcf.gz
bcftools stats -F /scratch/3/msf/SpeciesForReference/LER_ALM1580/LER_ALM1580.gatk.iteration4.consensus.FINAL.fa -s - LER_ALM_1580_raw_chr1.vcf.gz > LER_ALM_1580_raw_chr1.vcf.gz.stats
bcftools view -r 1:461 LER_ALM_1580_raw_chr1.vcf.gz
CHROM POS ID REF ALT QUAL FILTER INFO FORMAT LER_ALM_1580 1 461 . C . 3.53256 . DP=1;SGB=-0.379885;MQ0F=0;AF1=1;AC1=2;DP4=0,0,0,1;MQ=60;FQ=-29.9956 GT:PL 0/0:29
But if I use --format-fields, then the same as before occurs.
Any idea why this happens? Any advice on how to proceed?
Thank you