I am working with some mouse exome capture data, and I would like to provide functional annotations to variants (SNPs and small indels).

For human data, I would typically use ANNOVAR. It looks like ANNOVAR allows users to create custom annotations (which could be applied to anything, including mouse data), but it generally seems to primarily specialize in human genome annotations (for example, wANNOVAR doesn't appear to work with anything except hg18 or hg19).

I've done a quick search and found some other options (SnpEff, Mouse SNP Miner, VEP in Ensembl, etc.), but I don't really know what is commonly used in the mouse genomics community. Any suggestions? If ANNOVAR is still the best option for mice, what are the most useful databases to download?

I have used SnpEff and Annovar a lot. Both of them are great and give consistent results. SnpEff is easy and fast to use though (my personnel feeling). VEP should be great as Sanger guys used it to annotate the variants generated as part of Mouse genome project. VEP web interface is good for variant annotation if you have handful of variants. I tried setting up the stand alone perl script version few months ago and was not successful. So I ultimately ended up using SnpEff. But Annovar should be fine too. Other than the compulsory gene information (Refseq GFF/GTF or Ensemnl GFF/GTF), you can also provide:

1) dbsnp file that will tell you if the genomic variant (SNP or indel) is already known or exists in dbSNP.

2) cis-elements or Regulation information from mouse Encode project to annotate non-coding variants.

Ensembl is a great place to get this data.

Ok - I was leaning towards SnpEff was an ANNOVAR alternative, but I wanted to see what other people thought. It seems to provide a lot of additional and potentially useful information, so that is also a plus.

For humans, it looks like there is an hg19 SnpEff database, but I don't see either mm9 or mm10 for mouse. For both human and mouse, there are NCBI-based databases, but they seem pretty specific. For example, I have a mm9 alignment. SnpEff has databases for NCBIM37.64, NCBIM37.65, and NCBIM37.66.

Do you know which one most precisely matches UCSC mm9?

NCBIM37* represents mm9. You should choose the latest one. GRCm38* represents mm10. You should choose the latest one. Basically it depends on which version of reference sequence was used to align the reads. Another thing is that these databases will not include dbSNP and Regulation file.

