Question about the consequence types from VEP
1
1
Entering edit mode
7.8 years ago
vasilislenis ▴ 150

Hello,

I'm really new if the field and maybe my question is a little bit naive.

I'm trying to annotate my SNPs by using VEP. The reason to do this is to find the nonsense SNPs on each genome, the synonymous and non-Synonymous. The organism that I'm working on is sheep. The thing that is a little bit confusing to me is the tag system that Ensembl uses. For example, the "synonymous_variant" is for the synonymous SNPs, but I'm not so sure about the non-synonymous and the nonsense. I'm taking the "coding_sequence_variant" and "stop_gained", respectively. Am I right? Also, I cannot identify the CNVs. Is there any particular tag for this?

A second issue that I faced is that for some gene IDs there is no information about the name of the gene (symbol tag). Is there any way to use somehow a list with these IDs and find the names of these genes?

Thank you very much in advance and I'm really sorry for the questions "bombing".

VEP SNPs annotation Consequence types • 3.3k views
ADD COMMENT
4
Entering edit mode
7.8 years ago
Denise CS ★ 5.2k

The tag system is based on the Sequence Ontology (SO) consequence terms. Non-synonymous is not a SO term. This should be referred to as missense_variants according to SO. Check the SO definitions and a diagram showing the location of variants on the Calculated consequence variants page. The nonsense is known as stop_gained. If you annotate CNVs (larger insertions or deletions for example), you will have the same SO consequence terms. These are some of the consequences for copy_number_variation according to SO As for your second issue, I'd guess there is no gene name for the sheep gene, but you will have the Ensembl stable ID, e.g. ENSOARG00000005819. If you sent some examples, it'd be easier to help.

ADD COMMENT
0
Entering edit mode

Thank you very much for your help :)

I'm sending you some examples of tags that when I used the --symbol flag didn't give me the gene name.

ENSOARG00000000134 ENSOARG00000000154 ENSOARG00000000161

ADD REPLY
0
Entering edit mode

ENSOARG00000000134, ENSOARG00000000154 and ENSOARG00000000161 are all uncharacterised proteins, with no gene name, different from ENSOARG00000019179, the latter named as CD96.

ADD REPLY
1
Entering edit mode

Thank you very much for your help! So, I will leave them as unknown genes in my code.

ADD REPLY

Login before adding your answer.

Traffic: 1909 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6