Hey everyone,
It's a simple question, but the answer might be tricky :) (or not). In my current workflow, I made an unique identifier for vaiants from a VCF
composed of these substrings: #CHROM + "_" + POS + "_" + ALT
.
For my data this is unique for every variant. But this might not always be the case, would it?
There could be a deletion of the G
in a GT
leading to a T
as ALT
, but there could also be a simple mutation of that G
to a T
leading to the same combination of #CHROM
, POS
and ALT
. Or would this change the POS
?
So I wanted to extend this by REF
to #CHROM + "_" + POS + "_" + REF + "_" + ALT
.
So two questions:
- Is my assumption correct that
#CHROM + "_" + POS + "_" + ALT
might not always be unique? - Would
#CHROM + "_" + POS + "_" + REF + "_" + ALT
lead to a unique string in every possible case (according to theVCF
definitions)?
if both the build and the nomenclature system are promulgated