Hey everyone,
It's a simple question, but the answer might be tricky :) (or not). In my current workflow, I made an unique identifier for vaiants from a VCF composed of these substrings: #CHROM + "_" + POS + "_" + ALT.
For my data this is unique for every variant. But this might not always be the case, would it?
There could be a deletion of the G in a GT leading to a T as ALT, but there could also be a simple mutation of that G to a T leading to the same combination of #CHROM, POS and ALT. Or would this change the POS?
So I wanted to extend this by REF to #CHROM + "_" + POS + "_" + REF + "_" + ALT.
So two questions:
- Is my assumption correct that #CHROM + "_" + POS + "_" + ALTmight not always be unique?
- Would #CHROM + "_" + POS + "_" + REF + "_" + ALTlead to a unique string in every possible case (according to theVCFdefinitions)?
if both the build and the nomenclature system are promulgated