I have a list of variants in HGVS notation (c. and p.) and I want to find them in VCF files. I'm looking for the best way to do so.
Since the HGVS notation can vary (e.g. NM_000059:c.1813_1814insA vs. NM_000059:c.1813dup), I thought I'd use the genomic position as a filter.
I tried using VEP to get the genomic position, using HGVSc as input (online tool). But there's a problem with indels in repetitive area, which can be positioned at different locations.
"The standard way to report an insertion or a deletion in a VCF file is to write it in terms of the base upstream of it. HGVS works differently, they report the position of an insertion or a deletion in a repeat as the last position within the repeat. Since HGVS notation is in terms of the transcript, this means that for negative-stranded transcripts, the reported position is the same as that that would appear in VCF, but for positive-stranded transcripts, a different position is reported."
I need a way to get the position of the indels from HGVS notation, left-aligned as they would appear in a VCF files. Or - go with a different way of filtering... but which?
Can I solve my problem using VEP? and if not, how?
HGVS to find: NM_000179:c.3984_3987dup (I also have this info: p.Leu1330ValfsTer12)
VEP VCF position: 48033776 (using NM_000179:c.3984_3987dup as input)
My VCF files position: 48033769 (HGVS appears as NM_000179:c.3987_3988insGTCA )