My go to is Annovar, because there's tons of databases available and it's fairly flexible and powerful. Since you're interested in clinical variant annotations (presumably ClinVar), it's worth pointing out that you can create your own up-to-date ClinVar databases for Annovar.
Although I periodically update ClinVar database in ANNOVAR for help
users perform annotation, due to the frequent update schedule of
ClinVar, users are advised to create a database yourself using the
prepare_annovar_user.pl tool. An example procedure is given below[...]
See the ClinVar FTP server: ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/
This is important because ClinVar can change within a few weeks, so running an few year old ClinVar database may leave you with incomplete information. There's also other Annovar databases that might be clinically relevant, such as gnomAD/ExAC for population frequencies, or COSMIC for somatic cancer mutations.
Ensembl VEP is also quite popular and supports ClinVar I believe, but I haven't used it: https://uswest.ensembl.org/info/docs/tools/vep/vep_formats.html
You may also want to combine your input variants in a VCF format to external VCF files (which incidentally COSMIC, ClinVar, gnomAD all provide) and use some annotation tool, such as Vcfanno.
[...]we describe vcfanno, which flexibly extracts and summarizes
attributes from multiple annotation files and integrates the
annotations within the INFO column of the original VCF file.
Once in a while (when all else fails, or I'm in a hurry), I simply do it manually either using a bash scripting or R to mash non-standard annotations (for example, manually curated phenotypes) into their it's own column of a tab-delimited version of the VCF file. I would still recommend using some standard tool if possible, but sometimes it's not worth it or just quicker to do it that way.
thanks a lot Manuel! So do you also use manual scripts that match with phenotypes, are these from ClinVar or other sources do you rely on?
Thanks a lot again!
Sometimes yes, there are phenotypes in ClinVar, but while keeping in mind the reporting can be somewhat biased. For example, if you study a gene that's involved both in a common cancer and a rare neurological disorder, it's likely that you'll have a lot more sequencing/cases of cancer as opposed to the neurological disorder, even if a variant is implicated in both types of cases. Unfortunately for rare variants, I often have to gather the data manually from the literature/paper supplementary tables (which makes sense, since they're rare.) My approach then is to reduce each report to genomic coordinates (TransVar is awesome for that!) and then join on coordinates and base change.
I guess it's partially a question of throughput; if I'm only interested in specific variants, then I can afford the time to dig out whatever I can about them. I like to look these up in something like varsome, just to make sure I didn't miss anything.
I should also mention Phenocarta (which my lab maintains), a database that consolidates information on genes and phenotypes across multiple resources like OMIM, SFARI, etc. You can search by gene or phenotype and see what matches to help with your annotation.