Entering edit mode
6.7 years ago
mpz2263
•
0
As the title, when using GIAB NA12878 golden standard vcf file, do we need to normalize it before comparison? Like using: vt normalize
I found my colleague normalized it, but I believe GIAB NA12878 is a well-processed file which no need to normalize.
Can you elaborate on what 'vt normalize' is actually doing? If neither you nor your colleague know, then perhaps the best thing to do is to find out why it may be needed on any VCF, and then to decide if it's needed for 'GIAB NA12878' by inspecting the structure of the variant calls in this particular file.
Thanks for you apply, I think I asked a stupid question after reading this link.... https://genome.sph.umich.edu/wiki/Variant_Normalization, and I will agree with my colleague, just in case that GIAB NA12878 have multi allele representations.
I see, yes, depending on your downstream analysis, it can be good practice to split multi-allelic calls into multiple variants. I normalise every VCF/BCF that passes through my fingers, just so that each is starting from the same state of normalisation (minimises issues later on).