Question

Is SNPeff still the standard for variant effect prediction?

4

Entering edit mode

6.8 years ago

Lauren ▴ 70

I'm kind of new to this space-- a friend of mine says he uses SNPeff for all his exome annotations, and he doesn't know of any other popular tools for this purpose.

I'm annotating some human exomes and I am curious about what else is out there. A search gave me a lot of answers, but I don't know which are popular in the community. Are there gaps the SNPeff leaves that other effect predictors fill? Thank you so much for reading my post!

snpeff annotation exome variant • 7.6k views

ADD COMMENT • link updated 4.9 years ago by Shicheng Guo ★ 9.4k • written 6.8 years ago by Lauren ▴ 70

2

Entering edit mode

SnpEff is good also try VEP

ADD REPLY • link 6.8 years ago by Medhat 9.7k

1

Entering edit mode

4.9 years ago

Shicheng Guo ★ 9.4k

The State of Variant Annotation: A Comparison of AnnoVar, snpEff and VEP

http://blog.goldenhelix.com/goldenadmin/the-sate-of-variant-annotation-a-comparison-of-annovar-snpeff-and-vep/

ADD COMMENT • link 4.9 years ago by Shicheng Guo ★ 9.4k

score 8 · Accepted Answer · 2017-07-21

8

Entering edit mode

6.8 years ago

Samuel Brady ▴ 330

The tools I hear used most frequently are SnpEff, VEP, and Annovar. This paper (Table 1) shows a comparison of the three tools.

SnpEff tends to be robust and I personally use it the most. Remarkably, SnpEff can effectively annotate even structural variants and long indels, in addition to traditional smaller variants. I've used Annovar once or twice but strange bugs crop up here and there; however the developer of it maintains it well and offers a lot of documentation. VEP seems quite popular, but I personally have the least experience with this one.

ADD COMMENT • link 6.8 years ago by Samuel Brady ▴ 330

1

Entering edit mode

I like ClinVar annotations in ANNOVAR, but I believe you can use SNPeff for custom annotations from .bed files as well.

You can also use custom annotations ANNOVAR, which is what I did for GWAS Catalog associations.

Other than that, I think the answer kind of depends upon your question. For example, I wouldn't use protein function predictions alone to identify a variant candidate as damaging (and you would want to check for pre-mature stop codons and other loss-of-function variants). In practice, I would probably use population frequencies (like 1000 Genomes, gnomAD, etc.), but it would really be best if normal controls were matched by experimental protocol and bioinformatics processing.

ADD REPLY • link 4.9 years ago by Charles Warden 8.2k

0

Entering edit mode

Hi Chrles,

What's the best way to define or identify all the loss-of-functions variations in the human genome?
In your practice, How to use population frequency in GnomAD to define damaging variants?

Thanks.

ADD REPLY • link 4.9 years ago by Shicheng Guo ★ 9.4k

1

Entering edit mode

I'm not sure if I know the answer to what is the "best" way to identify loss-of-function variants. A pre-mature stop codon towards the beginning of the gene is probably valid (unless the gene can and does undergo alternative splicing), but I think there are some recommendations here: https://github.com/konradjk/loftee

In my opinion, I think having access to specialized information for previous disease associations is the right solution if you are trying to analyze your own data, but I think "ClinVar" is currently the best thing that I can think of for that.

In terms of getting ideas from current specialized databases, I think these may be some examples to consider:

https://brcaexchange.org/

https://www.cftr2.org/

I think 0.01 or 0.05 would be common frequencies to define rare variants. However, I usually have multiple variant frequency programs. If you see very different frequencies with different reference sets, I would guess the pre-processing and/or variant calling could be a factor.

In general, for discovery, I think you will probably see a higher frequencies of false positives. As long as you have a way to identify a few possibly important mutations per sample, visualization of the alignment is very important.

Otherwise, if you have your own set of cases and controls (collected and processed the same way), you can test for differences for all variants and then check enrichment of variant categories (like loss-of-function). I've also seen people summarize gene counts (for a particular method of annotating variants) and then compare frequencies at the gene level (between their own cases and controls). In that situation, if you knew the gene involved, you could try various methods to see that results in high ranking for your gene (for your particular disease / project).

ADD REPLY • link 4.9 years ago by Charles Warden 8.2k