I need a data set that includes the frequency and pathogenicity labels for splice site mutations. I have tried to select these features from dbSNP, ClinVar, ESP, and PhenCode data sets I queried via EnsemblBioMart, but none of them actually contain frequency / pathogenicity labels. Any ideas where else to look? I cannot afford a 5k subscription to HGMD pro!
BioMart sometimes doesn't work well with genome-wide queries such as this.
You could grep our VCF dump files; this command finds variants with any clinical significance states that are splice-site related: