Question: Question about the consequence types from VEP
2.8 years ago
United Kingdom
vasilislenis100 wrote:


I'm really new if the field and maybe my question is a little bit naive.

I'm trying to annotate my SNPs by using VEP. The reason to do this is to find the nonsense SNPs on each genome, the synonymous and non-Synonymous. The organism that I'm working on is sheep. The thing that is a little bit confusing to me is the tag system that Ensembl uses. For example, the "synonymous_variant" is for the synonymous SNPs, but I'm not so sure about the non-synonymous and the nonsense. I'm taking the "coding_sequence_variant" and "stop_gained", respectively. Am I right? Also, I cannot identify the CNVs. Is there any particular tag for this?

A second issue that I faced is that for some gene IDs there is no information about the name of the gene (symbol tag). Is there any way to use somehow a list with these IDs and find the names of these genes?

Thank you very much in advance and I'm really sorry for the questions "bombing".

written 2.8 years ago by vasilislenis100
2.8 years ago
UK, Hinxton, EMBL-EBI
Denise - Open Targets wrote:

The tag system is based on the Sequence Ontology (SO) consequence terms. Non-synonymous is not a SO term. This should be referred to as missense_variants according to SO. Check the SO definitions and a diagram showing the location of variants on the Calculated consequence variants page. The nonsense is known as stop_gained. If you annotate CNVs (larger insertions or deletions for example), you will have the same SO consequence terms. These are some of the consequences for copy_number_variation according to SO As for your second issue, I'd guess there is no gene name for the sheep gene, but you will have the Ensembl stable ID, e.g. ENSOARG00000005819. If you sent some examples, it'd be easier to help.

written 2.8 years ago by Denise - Open Targets

Thank you very much for your help :)

I'm sending you some examples of tags that when I used the --symbol flag didn't give me the gene name.

ENSOARG00000000134 ENSOARG00000000154 ENSOARG00000000161

written 2.8 years ago by vasilislenis100

ENSOARG00000000134, ENSOARG00000000154 and ENSOARG00000000161 are all uncharacterised proteins, with no gene name, different from ENSOARG00000019179, the latter named as CD96.

written 2.8 years ago by Denise - Open Targets

Thank you very much for your help! So, I will leave them as unknown genes in my code.

written 2.8 years ago by vasilislenis100
