Hello, I would like to check the accuracy of variant calling results using NA12878 vcf file from GIAB (Genome in a Bottle). By the way, I use bam files from 1000G project, which are aligned to the reference file having alt (e.g. ref_alt.fa). On the other hand, for NA12878 in GIAB, which is used to compare the accuracy of variant calling results, the reference file is a no-alt version (e.g. ref_no_alt.fa).
I'm curious about this. Which one should I use for variable calling, ref_alt.fa (used for alignment) or ref_no_alt.fa (from GiAB)? I wonder if there could be a problem with the result if different reference files are used for alignment and variable calling.
The additional problem here is that if I use ref_alt.fa instead of reference file from GiaB (ref_no_alt.fa) to call variants, the error below occur when I compare the vcf files using Illumina/hap.py
2021-01-26 15:51:31,788 WARNING [W] too many AD fields at chr9:70738787 max_ad = 2 retrieved: 3
2021-01-26 15:51:32,073 ERROR One of the preprocess jobs failed
2021-01-26 15:51:32,073 ERROR Traceback (most recent call last):
2021-01-26 15:51:32,073 ERROR File "./opt/hap.py/bin/hap.py", line 508, in <module>
2021-01-26 15:51:32,074 ERROR main()
2021-01-26 15:51:32,074 ERROR File "./opt/hap.py/bin/hap.py", line 363, in main
2021-01-26 15:51:32,074 ERROR "QUERY")
2021-01-26 15:51:32,074 ERROR File "/opt/hap.py/bin/pre.py", line 203, in preprocess
2021-01-26 15:51:32,074 ERROR haploid_x=gender == "male")
2021-01-26 15:51:32,074 ERROR File "/opt/hap.py/lib/python27/Haplo/partialcredit.py", line 214, in partialCredit
2021-01-26 15:51:32,074 ERROR raise Exception("One of the preprocess jobs failed")
2021-01-26 15:51:32,074 ERROR Exception: One of the preprocess jobs failed
So if there is no problem with the reliability of the result, I would like to use different reference files for alignment and variant calling. Please let me know if there is any problem or warning regarding this.
Thank you for reading the long question.
Thanks.