Understanding Variant Effect Predictor results
1
0
Entering edit mode
17 months ago
caro-ca ▴ 20

I am studying the effect of transposable elements in Saccharomyces cerevisiae populations. The outcome I have from Variant Effect Predictor is as follow:

    Category    Count
Variants processed   414
Overlapped genes 1553
Overlapped transcripts   1553
Overlapped regulatory features   -


What is the difference between overlapped genes and transcripts? If a transposon is overlapping a gene then I might not have a transcript at all or I can get a different transcript, depends where it is located. On the other hand, from literature, I know transposon can interrupt regulatory elements, does your database have annotations in yeast?

For all consequences predicted:

Upstream gene variant   49
Downstream gene variant 42
Intergenic variant  4
Transcript ablation 3
Coding sequence variant 1
Feature elongation  1
3' UTR variant  1


From above 1553 genes were overlapping to a transposon, but here how so many genes can be affected with the results above?

And finally, the information for the consequences on a protein sequence:

Stop codon lost 16
Coding sequence variant 84


Above coding sequence variants represent 1%, how is it possible that here the consequence is 84%?

I hope you could help me out. Thank you in advance for your time.

ensembl vep variant effect predictor • 615 views
0
Entering edit mode

I am still trying to understand the results. S. cerevisiae has ~6000 genes and according to the summary statistics, there are 1553 genes overlapping with a transposon sequence. This is approximately 25 % of genes been affected by transposons. How can there be so many genes being affected if the majority of the impact is in the up/downstream region of a gene? How can there be so many genes being affected by just 414 variants?

I really hope you could help me out. Thank you in advance.

1
Entering edit mode

The effect on the gene is that there is an up/downstream gene variant, ie that there is a gene within 5kb of the variant. This means that (49 + 42)% of the 1553 genes listed as being affected, ie 1413 genes, have a variant in the 5kb up/downstream of them. That is all that it means.

0
Entering edit mode

Additionally, my VCF input file had annotated 779 variants, but in your summary table depicted 414 variants processed. How does it work? I thought these two values were supposed to be the same.

0
Entering edit mode

It's possible that some of them failed. To find out more you'd need to send your list to helpdesk@ensembl.org.

0
Entering edit mode

Thank you, I will send an email. On the other hand, I was looking at the position of an affected gene when a deletion occurred in a transposable element (TE) in your genome browser and at the same time, I am using IGV. In your genome browser, how can I see my deletion? I can see the coding sequence variant and I noticed that in blue you label the TEs but I assume these are from the reference genome in the Saccharomyces Genome Database (SGD).

1
Entering edit mode
17 months ago
Emily 23k
1. A gene may have multiple transcripts and a variant may not overlap all transcripts of a gene. That is why there are two counts and they are often different.

2. The first pie chart shows you what % of all the consequences. The second pie chart shows you, of that 1%, what the divide is in there.

0
Entering edit mode

Thank you for your response, and do you have annotations for regulatory elements in yeast?

0
Entering edit mode

No, only human and mouse.