I'm trying to build a database that will link SNPs (via rs-IDs) to genes with their respective Ensembl ID, Entrez/NCBI ID, and HUGO gene name based on their chromosomal position. I need these to do some pathway analysis as suggested by answers to this question.
from snp132 as S
left join knownGene as K on
(S.chrom=K.chrom and not(K.txEnd+60000<S.chromStart or S.chromEnd+60000<K.txStart))
left join knownToEnsembl as K2G on
left join ensGtp as E on
left join kgXref as X on
left join refLink as R on
S.name in ("rs25","rs100")
mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 -E < query.sql
Try using Galaxy for points 1-3 above. Note that this is not the only way to go about this process, but I point it out for the less command-line-inclined folks. For the more command-line inclined, downloading tab-delimited text files will suffice.
Use Get Data --> UCSC Main to pull in snp132 data as a bed file
Use Get Data to pull in ensembl, refseq, uscs known gene, etc. as bed files.
Use Operate on Genomic Intervals --> Join to join datasets from step 2 with datasets from step 1
Download the HGNC data from here that links MANY identifiers to each other. You can use Galaxy Join, Subtract, and Group --> join two datasets to merge the HGNC data with columns from the output of step 3.
For point 4 in the original post, that one has been answered a few times on this list, but SIFT, annovar, snpEff, Ensembl Variant Annotator, or the UCSC snp132CodingDbSNP table (and others) can all provide some insight for rs numbers. With a little creative genomic region conversions, you can probably come up with a set of tables yourself.
To use BioMart to link genes with SNPs, it's necessary to start with Database: Ensembl Variation. Dataset can be Homo sapiens variation. Filters: General variation filters: Filter by Variation ID. Attributes are Ensembl Gene ID.