Question: Is There An Easy Way Of Getting Gene Symbols From Genomic Coordinates?
1
gravatar for merajazizmeraj
6.4 years ago by
United States
merajazizmeraj20 wrote:

i have genomic coordinates from hg18 build and want to get the gene symbols. I have tried biomart and it only has the hg19 option. Is there any other quick and easy way? I have ~8000 ranges across all the chromosomes.

gene coordinates • 7.5k views
ADD COMMENTlink modified 20 months ago by Roman Valls Guimerà530 • written 6.4 years ago by merajazizmeraj20

duplicate of

Find out the genes that correspond to my coordinates

ADD REPLYlink written 6.4 years ago by Pierre Lindenbaum124k

how can i get the gene symbols for the regions. Do you know the syntax for that.

ADD REPLYlink written 6.4 years ago by merajazizmeraj20

More of a partial duplicate. They also want to know how to access hg18 via Biomart. Judging by the archive page (earliest is v54, May 2009), this is not possible.

ADD REPLYlink modified 6.4 years ago • written 6.4 years ago by Neilfws48k

Archive 54 is the NCBI36 build (aka hg18) so it should work fine.

ADD REPLYlink written 6.0 years ago by Emily_Ensembl20k
4
gravatar for Alex Reynolds
6.4 years ago by
Alex Reynolds29k
Seattle, WA USA
Alex Reynolds29k wrote:

Here's one way to do it with the UCSC Genome Browser, I think.

Assuming a bash shell, define some parameters:

$ CHR="chr1"
$ START=11000000
$ STOP=12000000

To get the first 10 gene symbols for hg18 within this range:

$ mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -N -e \
    "SELECT kg.chrom, kg.txStart, kg.txEnd, x.geneSymbol \
        FROM knownGene kg, kgXref x \
        WHERE kg.chrom LIKE '${CHR}' AND kg.txStart >= ${START} AND kg.txEnd < ${STOP} \
        GROUP BY(x.geneSymbol) \
        LIMIT 10;" hg18

+------+----------+----------+-----------+
| chr1 | 11009166 | 11029872 | 16G2      |
| chr1 | 11009166 | 11029872 | 214K23.2  |
| chr1 | 11009166 | 11029872 | 44050     |
| chr1 | 11009166 | 11029872 | 5-OPase   |
| chr1 | 11009166 | 11029872 | 6a8b      |
| chr1 | 11009166 | 11029872 | A121/SUI1 |
| chr1 | 11009166 | 11029872 | A18hnRNP  |
| chr1 | 11009166 | 11029872 | A1BG      |
| chr1 | 11009166 | 11029872 | A1CF      |
| chr1 | 11009166 | 11029872 | A26B1     |
+------+----------+----------+-----------+
ADD COMMENTlink modified 6.4 years ago • written 6.4 years ago by Alex Reynolds29k
1
gravatar for B. Arman Aksoy
6.0 years ago by
B. Arman Aksoy1.2k
New York, NY
B. Arman Aksoy1.2k wrote:

If you are an R-person, I suggest trying Bioconductor's CNTools:

http://www.bioconductor.org/packages/release/bioc/html/CNTools.html

It is pretty convenient and the how-to document on the page above really helps in terms of getting started.

ADD COMMENTlink written 6.0 years ago by B. Arman Aksoy1.2k
1
gravatar for User 1933
6.0 years ago by
User 1933340
User 1933340 wrote:

if you list your region like

1:9330001:9395000
1:149242001:149250000
1:171936001:171971000
1:174059001:174143000
1:219914001:227775000

you can use the following code to get the corresponding gene symbols.

rm(list = ls())
your_region = read.table("yourtable.csv")

library("biomaRt")
ensembl = useMart("ensembl",dataset="hsapiens_gene_ensembl")

ensembl54=useMart("ENSEMBL_MART_ENSEMBL", host="jan2013.archive.ensembl.org/biomart/martservice/", dataset="hsapiens_gene_ensembl")

chr.region = as.matrix(your_region$V1)

entrez.ids=vector() 
entrez.count=vector()
all.results=data.frame() 

for (cnt in 1:length(chr.region))
{
    print(cnt)
    filterlist=list(chr.region[cnt],"protein_coding")
    results=getBM(attributes = c("hgnc_symbol","ensembl_gene_id","entrezgene", "chromosome_name", "start_position", "end_position"), 

    filters = c("chromosomal_region","biotype"), values = filterlist, mart = ensembl54)
      all.results=rbind(all.results,results)

}
ADD COMMENTlink written 6.0 years ago by User 1933340
1
gravatar for Giulietta - Ensembl Helpdesk
5.9 years ago by
Cambridge, UK

BioMart does support hg18 (NCBI36). This functionality can be found by going to the e54 archive site, which Ensembl plans to maintain for at least another year and a half. Find it here:

http://may2009.archive.ensembl.org/biomart/martview

If you know Perl, you can also access the archive through the Perl API:

http://may2009.archive.ensembl.org/info/data/api.html

I hope this helps.

ADD COMMENTlink written 5.9 years ago by Giulietta - Ensembl Helpdesk1.2k
1
gravatar for brentp
5.9 years ago by
brentp23k
Salt Lake City, UT
brentp23k wrote:

In python with cruzdb (available here: https://pypi.python.org/pypi/cruzdb):

from cruzdb import Genome
hg18 = Genome('hg18')
hg18.bin_query('refGene', 'chr1', '123456', '223456')
ADD COMMENTlink written 5.9 years ago by brentp23k
0
gravatar for Roman Valls Guimerà
20 months ago by
Melbourne
Roman Valls Guimerà530 wrote:

In today's (2018) cruzdb it seems to require a bit more work than that since one has to iterate/fetchall through the sql query (also give the intervals with integers instead of str):

In [9]: from cruzdb import Genome
 ...: hg18 = Genome('hg18')
 ...: hg18.bin_query('refGene', 'chr1', 123456, 223456)
 ...:
Out[9]: <sqlalchemy.orm.query.Query at 0x10b0297d0>

In [10]: from cruzdb import Genome
...: hg18 = Genome('hg18')
...: q = hg18.bin_query('refGene', 'chr1', 123456, 223456)
...:
...:

In [11]: q.statement.execute().fetchall()
Out[11]: [(585, 'NR_039983', 'chr1', '-', 124635L, 130429L, 130429L, 130429L, 3L, '124635,129652,129937,', '129559,129710,130429,', 0L, 'LOC729737', u'unk', u'unk', '-1,-1,-1,')]
ADD COMMENTlink written 20 months ago by Roman Valls Guimerà530
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2359 users visited in the last hour