Question: Batch Search Effect Of A Snp On Protein (Conserved) Domain
I have a list of nonsynonymous SNV's

I would like to batch search them all to view if any of the variations is in an conserved domain/motif/active site. all databases I have found don't like position input / mutation position.

for each variant i have chromosomal position , gene name and amino acid change in gene like this:



Is there a way to batch search a list of mutations? against a protein domain database?

I am not aware of any resource that can do that analysis for you.

What I would do is:

  1. Use the Ensembl BioMart to fetch the amino acid sequence for the protein product of each of your Ensembl transcripts.
  2. Map protein domains onto these by submitting the sequences to InterPro, Pfam, and/or SMART.
  3. Write a small script to check if the amino acid changes caused by the SNVs fall inside predicted protein domains.
I don't like programming :P , but thanks for idea

Here are some tools that I have used for batch querying

  • polyphen
  • pmut
  • sift

Relevant discussion thread: Algorithms Predicting Effects Of Snps / Aa Substitution On Protein

Please add links for those tools.

There are some tools available, but I do not know which is the best one for the state of the art.

You can read Ng, Henikoff 2006 for a review, and Burke DF et al 2007 for an example of application of different tools. When we wrote the computational paragraph for the open collaborative paper on interpreting Post-GWAS results, we collected some tools to do the analysis you are asking for: have a look at it.

Thanks nice article lots of tools to play with

You can see if the SNP is pathogenic or benign to protein structure using polyphen. To do large batch queries you will need a local installation and it is huge and takes a long time to prepare the databases. I've never used SIFT but i have heard you can get pre computed sift scores so you might be able to do that.

But polyphen and SIFT apply to the overall structure. To see if a SNP affects a particular protein domain i think you might have to what Lars suggested and and write a script. A slight variation on his excellent idea is to use the ensembl perl api and create a sequence slice for each snp, get the transcripts that overlap this slice and then get the domains for each transcript and see if the domain overlaps your snp. I don't the the api well enough to know if you can get the protein domains overlapping the snp directly. But anyhow, you get the point I'm sure.

