Question: 1000 Genomes Snp Annotation
6.6 years ago
Since so many new SNPs are present in the 1000 Genomes data and latest builds of dbSNP, a large number of websites such as F-SNP and the SNP Info Web Server's function prediction module do not have annotation predicting the function of said SNPs. Going beyond things like PolyPhen or SIFT estimates of synonymous/non-synonymous intronic/intergenic, is there an integrated pipeline which can give details on things like: Is the SNP in a TF binding site?

Splice site?

miRNA binding site?

Conserved region?

Other regulatory region?

eQTL for genes?

The last of these is unlikely. F-SNP only runs up through dbSNP build 126 and the SNP Infor Webserver only covers later HapMap builds?

What is the latest integrated pipeline for annotating SNP function for 1000 Genomes variants with rs numbers? Is there one? Or must one use a collection of tools? What do those of you with a long list of SNPs from the more recent dbSNP builds do to annotate these things?

6.6 years ago
Santiago de Compostela, Spain
there are several annotation tools out there, publicly available to perform all these needs you have: snpEff, Variant Effect Predictor, SeattleSeq Annotation, ANNOVAR,... please refer to this previous question to get a broad description of some of the previously mentioned.

as a straight answer, I would definitely go for ANNOVAR, since it already has an embeded pipeline which provides you with all the things you are asking for, and generates a single csv excel readable file. the script is called, and it works pretty well, since it queries more than 10 locally previously downloaded databases and summarizes them all into a very convenient single file.

6.6 years ago
Cambridge UK
The 1000genomes project provides coding and non coding annotation for its variants from the phase1

You can find both the annotation used and the annotated vcfs on the ftp site

