I have been trying to download the splice site mutations found in dbSNP along with their corresponding genomic coordinates and nucleotide changes. I have spent way too much time trying to create a local mySQL instance of dbSNP from this site: https://cgsmd.isi.edu/dbsnpq/downloads.php#dbSNPdownloadsTables . The sql dumps and .txt dumps are not in UTF-8 encoding and I cannot figure out an encoding to get them to render normal text.
Any suggestions on ways to get this data from dbSNP? Any help would be much appreciated!
Thanks Michael and Emily! This has made this process so much easier. I still need the genomic reference nucleotide for each variant. I understand that to gain a true reference, I would need to specify populations and look into allele frequencies. Would using the "ancestral allele" serve as a good surrogate?
To provide some context -- I am trying to obtain a set of benign splice site mutations. To do this I set a filter for minor allele frequencies that are > 0.05. I need reference and alternate nucleotide positions so that I can gauge the effectiveness of programs that predict pathogenicities of splice site mutations. I believe the ancestral allele should be decent surrogate because I assume the most common allele in the chimp genome is reported, but still have some reservations.