Question

What is SIFT-score?

2

Entering edit mode

9.4 years ago

mangfu100 ▴ 800

Hi.

I am wondering a term with SIFT score.

I think that SIFT refers to some measurement of SNPs, and while reading Annovar paper, I saw below sentence as follows:

Finally, Annovar can filter specific variants such as SNPs with >1% frequency in the 1000 Genomes Projects, or non-synonymous SNPs with SIFT scores > 0.05.

Regarding above sentence, I ask you two questions.

I think that 1% frequency is a little bit low allele frequency. Dose it have an effect to filtering irrelevant snp variants? I don't think so..
SIFT-score threshold is about 0.05 as shown in above sentence. What does SIFT means about and threshold of 0.05 might be effect on filtering variants?

sequencing alignment next-gen • 26k views

ADD COMMENT • link updated 2.2 years ago by Ram 43k • written 9.4 years ago by mangfu100 ▴ 800

7

Entering edit mode

9.4 years ago

Steve Lianoglou 5.2k

The first line from the SIFT website says:

SIFT predicts whether an amino acid substitution affects protein function.

It is a method to help curate "variants of interest" in coding regions.

Other such tools include:

PolyPhen
Variant Effect Predictor (combines the two, among others)
Condel is another, but its website isn't loading for me at the moment

Googling around will uncover others ...

ADD COMMENT • link updated 2.2 years ago by Ram 43k • written 9.4 years ago by Steve Lianoglou 5.2k

0

Entering edit mode

An update: VEP is not in the same tier as SIFT or PolyPhen2 - the latter are predictors, and VEP is an annotator. It uses SIFT and PolyPhen2 (not combines, uses) to annotate a nucleotide change in a protein coding context with the effect of its downstream amino acid change. ConDel (Consensus Deletion something) combines multiple tools, and something like PredictSNP adds its own score to a bunch of tools it runs in the background.

ADD REPLY • link 2.2 years ago by Ram 43k

1

Entering edit mode

8.4 years ago

r0ntu ▴ 50

If you're looking for the effect of SNPs on protein function, you can probably use CRAVAT. It provides a VEST pathogenecity score that enumerates its functional impact. Refer to this post for more details about the tool.

ADD COMMENT • link 8.4 years ago by r0ntu ▴ 50

score 8 · Accepted Answer · 2014-11-25

SIFT and PolyPhen are the two most commonly used algorithms for predicting if a SNP has a (generally negative) effect on protein structure. Due to the nature of the redundant genomic code, many SNPs never translate into any effect in the protein - far more than you would expect by chance - because variations which effect protein sequence are usually under negative selection pressure - so SIFT/PloyPhen can be used to weed out a lot of irrelevant stuff from a very large list of candidate variations.

If i'm not mistaken (and its been a long time since i used either, so i might be making this up) SIFT's algorithm gives more weighting to variations which change the net charge of the protein, while PolyPhen uses aminoacid or base conservation to determine relative importance. Both obviously rank premature stop variants and other nonsense variants very highly - so often there is a lot of overlap.

Again, it's been a long time since I used either, i might have gotten that the wrong way around. But i can tell you this - I spent 3 years studying single-basepair exon variants in consanguineous families with a known phenotype, and very very very rarely did SIFT or PolyPhen ever guess the correctly the variant from a list. I wouldn't say they are junk, they're not, but variants which caused transcription factor non-specificity, splicing variants, RNAPol destabilisation, etc, are completely ignored. Do not rely on SIFT and Polyphen for anything other than ordering a candidate list for follow-up analysis :)

score 4 · Accepted Answer · 2014-11-25

To answer your first question, 1% is the standard cutoff used to describe the difference between "common" and "rare" variants. Depending on your study, you might want to change that. For example, in a GWAS for a common trait, you might be interested only in variants that are above a certain frequency in the population, whereas if you're looking at rare Mendelian traits you might only want very low frequency variants. You may also want to narrow this down to a specific population, eg for a GWAS in African Americans, you would be interested in variants common in African populations. Steve mentioned the VEP, which allows you to filter variants by frequency, choosing your own frequency, > or < and pick a population of interest.