I have been working on something similar (using genomes).
Here is what I've learned so far:
-This is unfortunately more difficult than I originally hoped. For example, general knowledge is that variants found in COSMIC will be somatic in other samples, when in practice I've flagged many variants as somatic for that reason, only to find later that they are germline. One metric that I've found relatively useful when comparing against COSMIC is to limit the trustworthy somatic calls to those that are identified in a minimum number of studies (there are lots that are found in only 1 study). However, the business of using a database of somatic calls to select the somatic calls from a germline set has not been very successful for me.
-Filtering out all the variants listed in dbSNP 144 (the latest on hg19) is very helpful. This release now includes data from 1000 genomes as well as ExAC -> all rich germline data sets. In my experience you need to be careful filtering out all variants seen in ExAC, and its better to not filter some that are at really low frequencies.
-Be careful with the dbSNP filtering. There are many real somatic variants in there. For example, it seems all somatic variants found in COLO-829 have been flagged as somatic in dbSNP (using the SAO field). Unfortunately, somatic variants found outside of published cell lines are not as likely to be marked as somatic in dbSNP. In fact I did my initial testing using COLO-829 only to learn later that although dbSNP is so precise with its somatic annotations of COLO-829 variants, it it very hit or miss (mostly miss) for somatic variants identified in real cancer samples.
-Be careful with over filtering. I have found that the germline filtering works relatively well, but there are many cases where a known hotspot mutation (PI3KCA, or BRCA2, for example) is listed in dbSNP and not marked as somatic.
Throwing everything together I'm able to get about 80% sensitivity and 20% specificity in classifying a set of (coding) variants as germline or somatic.