I am looking at a bunch of SNPs. Some of them are part of genes, but other are not. I am interested to look up +60KB or -60KB of those SNPs to get details about some nearby genes. Please share your experience in dealing with such a situation or thoughts on any methods that can do this. Thanks in advance.
~> mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 -e '
from snp130 as S
left join knownGene as K on
(S.chrom=K.chrom and not(K.txEnd+60000<S.chromStart or S.chromEnd+60000<K.txStart))
S.name in ("rs25","rs100","rs75","rs9876","rs101")'
I imagine you have this sorted out by now, but I thought I'd add that assuming your SNPs are in BED/ROD or VCF format and your genes are in BED12 or GFF format, then you can easily do this with a single command in my BEDTools suite. The relevant command would be:
windowBed -a snps.bed -b genes.bed -w 60000 > snps.withGenesWithin60Kb.bed
I just checked it works with that size flanks. The result is a FASTA file.
On the other hand, if you have a SNP position it could be easier to simply search for the genes directly in the genome annotation. Get a genome annotation as a tab separated file from e.g. BioMart and search for the genes with start/stop positons within this range.
If you just want gene names within 60kb of the SNP:
If you want their locations, also:
These are operations to retrieve genes within 60kb of a SNP.
If you want the very nearest gene to a SNP, you can use
This gives you, for each SNP, its nearest gene and the distance value in bases.
These and other genomic set operation tools are part of the BEDOPS toolkit.