Dear all,
For some analysis, I have gsvar files of whole genome variants compared to GRCh37.
My logic, is to combine the VEP coordinates with the transcript.coding_sequence
from pyensembl.
For one particular deletion:
chr start end ref obs
chr1 248616705 248616711 TGCTGCG -
The VEP column, of the same variant row, follows:
OR2T2:ENST00000342927:frameshift_variant:HIGH:exon1/1:c.612_618del:p.Cys204Ter:PF13853 [Olfactory receptor] Using the pyensembl the coding sequence for the VEP coordinates is:
>>>seq = "ATGGGCATGGAGGGTCTTCTCCAGAACTCCACTAACTTCGTCCTCACAGGCCTCATCACCCATCCTGCCTTCCCCGGGCTTCTCTTTGCAATAGTCTTCTCCATCTTTGTGGTGGCTATAACAGCCAACTTGGTCATGATTCTGCTCATCCACATGGACTCCCGCCTCCACACACCCATGTACTTCTTGCTCAGCCAGCTCTCCATCATGGATACCATCTACATCTGTATCACTGTCCCCAAGATGCTCCAGGACCTCCTGTCCAAGGACAAGACCATTTCCTTCCTGGGCTGTGCAGTTCAGATCTTCCTCTACCTGACCCTGATTGGAGGGGAATTCTTCCTGCTGGGTCTCATGGCCTATGACCGCTATGTGGCTGTGTGCAACCCTCTACGGTACCCTCTCCTCATGAACCGCAGGGTTTGCTTATTCATGGTGGTCGGCTCCTGGGTTGGTGGTTCCTTGGATGGGTTCATGCTGACTCCTGTCACTATGAGTTTCCCCTTCTGTAGATCCCGAGAGATCAATCACTTTTTCTGTGAGATCCCAGCCGTGCTGAAGTTGTCTTGCACAGACACGTCACTCTATGAGACCCTGATGTATGCCTGCTGCGTGCTGATGCTGCTTATCCCTCTATCTGTCATCTCTGTCTCCTACACGCACATCCTCCTGACTGTCCACAGGATGAACTCTGCTGAGGGCCGGCGCAAAGCCTTTGCTACGTGTTCCTCCCACATTATGGTGGTGAGCGTTTTCTACGGGGCAGCCTTCTACACCAACGTGCTGCCCCACTCCTACCACACTCCAGAGAAAGATAAAGTGGTGTCTGCCTTCTACACCATCCTCACCCCCATGCTCAACCCACTCATCTACAGCTTGAGGAATAAAGATGTGGCTGCAGCTCTGAGGAAAGTACTAGGGAGATGTGGTTCCTCCCAGAGCATCAGGGTGGCGACTGTGATCAGGAAGGGCTAG"
>>>seq[612-1:618]
'CGTGCTG'
You can see that this sequence is not the same as the ref column in gsvar (TGCTGCG).
Does anyone have encountered the same case?
Grateful to your ideas to resolve such cases.
Thank you and keep safe!
Damianos
Thank you very much for the thorough explanation!
I know understand the phenomenon :)