Question: How To Map A Snp To A Gene Around +/- 60Kb ?
6
gravatar for Khader Shameer
4.1 years ago by
Rochester, MN
Khader Shameer14k wrote:

I am looking at a bunch of SNPs. Some of them are part of genes, but other are not. I am interested to look up +60KB or -60KB of those SNPs to get details about some nearby genes. Please share your experience in dealing with such a situation or thoughts on any methods that can do this. Thanks in advance.

ADD COMMENTlink written 4.1 years ago by Khader Shameer14k
17
gravatar for Pierre Lindenbaum
4.1 years ago by
France
Pierre Lindenbaum58k wrote:

Using the UCSC mysql server:

~> mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 -e '
select
 K.proteinID,
 K.name,
 S.name,
 S.avHet,
 S.chrom,
 S.chromStart,
 K.txStart,
 K.txEnd
from snp130 as S
left join knownGene as K on
 (S.chrom=K.chrom and not(K.txEnd+60000<S.chromStart or S.chromEnd+60000<K.txStart))
where
 S.name in ("rs25","rs100","rs75","rs9876","rs101")'

result:

+-----------+------------+--------+----------+-------+------------+----------+----------+
| proteinID | name       | name   | avHet    | chrom | chromStart | txStart  | txEnd    |
+-----------+------------+--------+----------+-------+------------+----------+----------+
| NULL      | NULL       | rs100  |        0 | chr7  |   24438348 |     NULL |     NULL |
| NULL      | NULL       | rs101  |        0 | chr7  |   24438147 |     NULL |     NULL |
| NP_056019 | uc003ssf.3 | rs25   | 0.499586 | chr7  |   11584141 | 11414172 | 11871824 |
| NP_056019 | uc003ssf.3 | rs75   | 0.241967 | chr7  |   11613691 | 11414172 | 11871824 |
| B2RNV1    | uc003tnv.2 | rs9876 | 0.426096 | chr7  |   47315290 | 47314752 | 47579199 |
| B2RNV1    | uc003tnw.2 | rs9876 | 0.426096 | chr7  |   47315290 | 47314752 | 47621742 |
+-----------+------------+--------+----------+-------+------------+----------+----------+
ADD COMMENTlink written 4.1 years ago by Pierre Lindenbaum58k
3

Pierre, this is great solution. I completely forgot about mysql server. However, I'd recommend to use refFlat table instead of knownGene and K.geneName instead of K.proteinID. In this case you will get HUGO gene name. Anyway, it's up to OP.

ADD REPLYlink written 4.1 years ago by Yuri1.1k
1

Very nice solution - anything that takes us away from ad hoc Perl scripts is good!

ADD REPLYlink written 4.1 years ago by Mikael Huss3.4k
1

Recently, a question similar to this question was asked in BioBB. URL : http://www.bioinformatics.org/pipermail/bbb/2010-April/004849.html

ADD REPLYlink written 4.0 years ago by Khader Shameer14k

Excellent ! I was looking for something like this. Thanks a lot Pierre.

ADD REPLYlink written 4.1 years ago by Khader Shameer14k

Ditto. Very nice solution, I didn't even know that server existed. looks like they may be getting a bit more traffic in the future.

ADD REPLYlink written 4.1 years ago by Ian Simpson860

I wanted to limit my results to items with a geneSymbol so I added an extra join:

select S.name as rsId, kgXref.geneSymbol from snp131 as S left join knownGene as K on (S.chrom=K.chrom and not(K.txEnd+60000

ADD REPLYlink written 3.1 years ago by Petervermont0

I found the UCSC server unusably slow. I used the UCSC Table browser to export tab delimited files. Bu picking 'selected fields from primary and related tables' I was able to just get the fields I wanted. I then used Toad for MySQL (Freeware on Windows) to import the flat file to a local db in a single step.

ADD REPLYlink written 3.0 years ago by Petervermont0

I found the UCSC server unusably slow. I used the UCSC Table browser to export tab delimited files for the snp131 and refFlat (per Yuri above) tables. By choosing 'selected fields from primary and related tables' from drop down menu I was able to get only the the fields I wanted, thereby saving time and disk space. I then used Toad for MySQL (Freeware on Windows) to import the flat files to a local db in a single step each.

ADD REPLYlink written 3.0 years ago by Petervermont0

quite old but still - doesn't "not(K.txEnd+60000<S.chromStart" also mean "all genes that end more than 60kb away from snp" to the right?

ADD REPLYlink written 2.6 years ago by Dima Lvov30

'not(K.txEnd+60000<S.chromStart or S.chromEnd+60000<K.txStart)' means if the gene end +60KB is before the SNP or the gene start - 60Kb if after the SNP the DO NOT accept the SNP.

ADD REPLYlink written 2.6 years ago by Pierre Lindenbaum58k
4
gravatar for Michael Dondrup
4.1 years ago by
Bergen
Michael Dondrup27k wrote:

No need to program this. This can be done as a BioMart query in MartView:

rs8 with 60000 bases up/donw-flanks

I just checked it works with that size flanks. The result is a FASTA file.

On the other hand, if you have a SNP position it could be easier to simply search for the genes directly in the genome annotation. Get a genome annotation as a tab separated file from e.g. BioMart and search for the genes with start/stop positons within this range.

ADD COMMENTlink written 4.1 years ago by Michael Dondrup27k

Thanks Michael.

ADD REPLYlink written 4.1 years ago by Khader Shameer14k

I am afraid this is not exactly I am looking for. I wanted to check if a particular SNP is not mapped to a gene, whether there will be any known genes in +/- 60KB of that SNP. This particular BioMart query is fetching the sequence not annotation. I have tried one with annotation and it works. : http://www.biomart.org/biomart/martview/c26efcafd362abccc4af83039c441762. But sometimes BioMart out put requires lot of post-processing. Also it will be nice if BioMart can have options to select fields multiple attributes.

ADD REPLYlink written 4.1 years ago by Khader Shameer14k

Yes, sorry, I misunderstood your question. What Pierre described is much better here.

ADD REPLYlink written 4.1 years ago by Michael Dondrup27k
4
gravatar for Aaronquinlan
3.8 years ago by
Aaronquinlan7.3k
Aaronquinlan7.3k wrote:

I imagine you have this sorted out by now, but I thought I'd add that assuming your SNPs are in BED/ROD or VCF format and your genes are in BED12 or GFF format, then you can easily do this with a single command in my BEDTools suite. The relevant command would be:

$ windowBed -a snps.bed -b genes.bed -w 60000 > snps.withGenesWithin60Kb.bed
ADD COMMENTlink written 3.8 years ago by Aaronquinlan7.3k

Thanks for adding your answer. Interesting to note the BEDTools. I will go through it.

ADD REPLYlink written 3.8 years ago by Khader Shameer14k
2
gravatar for Yuri
4.1 years ago by
Yuri1.1k
Bethesda, MD
Yuri1.1k wrote:

I downloaded refFlat table from UCSC Genome Browser, which contains gene symbols and their genomic locations. Then mapped SNPs to genes. I did it in MATLAB, but can be done with any language.

You can also try a Perl script from here.

ADD COMMENTlink modified 4.1 years ago • written 4.1 years ago by Yuri1.1k

Thanks for your points.

ADD REPLYlink written 4.1 years ago by Khader Shameer14k
Please log in to add an answer.

Help
Access
  • RSS
  • Stats
  • API

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.0.0
Traffic: 593 users visited in the last hour