Question: Is There An Easy Way Of Getting Gene Symbols From Genomic Coordinates?
1
gravatar for merajazizmeraj
5.7 years ago by
United States
merajazizmeraj20 wrote:

i have genomic coordinates from hg18 build and want to get the gene symbols. I have tried biomart and it only has the hg19 option. Is there any other quick and easy way? I have ~8000 ranges across all the chromosomes.

gene coordinates • 6.7k views
ADD COMMENTlink modified 11 months ago by Roman Valls Guimerà510 • written 5.7 years ago by merajazizmeraj20

duplicate of

Find out the genes that correspond to my coordinates

ADD REPLYlink written 5.7 years ago by Pierre Lindenbaum118k

how can i get the gene symbols for the regions. Do you know the syntax for that.

ADD REPLYlink written 5.7 years ago by merajazizmeraj20

More of a partial duplicate. They also want to know how to access hg18 via Biomart. Judging by the archive page (earliest is v54, May 2009), this is not possible.

ADD REPLYlink modified 5.7 years ago • written 5.7 years ago by Neilfws48k

Archive 54 is the NCBI36 build (aka hg18) so it should work fine.

ADD REPLYlink written 5.3 years ago by Emily_Ensembl17k
4
gravatar for Alex Reynolds
5.7 years ago by
Alex Reynolds27k
Seattle, WA USA
Alex Reynolds27k wrote:

Here's one way to do it with the UCSC Genome Browser, I think.

Assuming a bash shell, define some parameters:

$ CHR="chr1"
$ START=11000000
$ STOP=12000000

To get the first 10 gene symbols for hg18 within this range:

$ mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -N -e \
    "SELECT kg.chrom, kg.txStart, kg.txEnd, x.geneSymbol \
        FROM knownGene kg, kgXref x \
        WHERE kg.chrom LIKE '${CHR}' AND kg.txStart >= ${START} AND kg.txEnd < ${STOP} \
        GROUP BY(x.geneSymbol) \
        LIMIT 10;" hg18

+------+----------+----------+-----------+
| chr1 | 11009166 | 11029872 | 16G2      |
| chr1 | 11009166 | 11029872 | 214K23.2  |
| chr1 | 11009166 | 11029872 | 44050     |
| chr1 | 11009166 | 11029872 | 5-OPase   |
| chr1 | 11009166 | 11029872 | 6a8b      |
| chr1 | 11009166 | 11029872 | A121/SUI1 |
| chr1 | 11009166 | 11029872 | A18hnRNP  |
| chr1 | 11009166 | 11029872 | A1BG      |
| chr1 | 11009166 | 11029872 | A1CF      |
| chr1 | 11009166 | 11029872 | A26B1     |
+------+----------+----------+-----------+
ADD COMMENTlink modified 5.7 years ago • written 5.7 years ago by Alex Reynolds27k
1
gravatar for B. Arman Aksoy
5.3 years ago by
B. Arman Aksoy1.2k
New York, NY
B. Arman Aksoy1.2k wrote:

If you are an R-person, I suggest trying Bioconductor's CNTools:

http://www.bioconductor.org/packages/release/bioc/html/CNTools.html

It is pretty convenient and the how-to document on the page above really helps in terms of getting started.

ADD COMMENTlink written 5.3 years ago by B. Arman Aksoy1.2k
1
gravatar for User 1933
5.3 years ago by
User 1933340
User 1933340 wrote:

if you list your region like

1:9330001:9395000
1:149242001:149250000
1:171936001:171971000
1:174059001:174143000
1:219914001:227775000

you can use the following code to get the corresponding gene symbols.

rm(list = ls())
your_region = read.table("yourtable.csv")

library("biomaRt")
ensembl = useMart("ensembl",dataset="hsapiens_gene_ensembl")

ensembl54=useMart("ENSEMBL_MART_ENSEMBL", host="jan2013.archive.ensembl.org/biomart/martservice/", dataset="hsapiens_gene_ensembl")

chr.region = as.matrix(your_region$V1)

entrez.ids=vector() 
entrez.count=vector()
all.results=data.frame() 

for (cnt in 1:length(chr.region))
{
    print(cnt)
    filterlist=list(chr.region[cnt],"protein_coding")
    results=getBM(attributes = c("hgnc_symbol","ensembl_gene_id","entrezgene", "chromosome_name", "start_position", "end_position"), 

    filters = c("chromosomal_region","biotype"), values = filterlist, mart = ensembl54)
      all.results=rbind(all.results,results)

}
ADD COMMENTlink written 5.3 years ago by User 1933340
1
gravatar for Giulietta - Ensembl Helpdesk
5.2 years ago by
Cambridge, UK

BioMart does support hg18 (NCBI36). This functionality can be found by going to the e54 archive site, which Ensembl plans to maintain for at least another year and a half. Find it here:

http://may2009.archive.ensembl.org/biomart/martview

If you know Perl, you can also access the archive through the Perl API:

http://may2009.archive.ensembl.org/info/data/api.html

I hope this helps.

ADD COMMENTlink written 5.2 years ago by Giulietta - Ensembl Helpdesk1.2k
1
gravatar for brentp
5.2 years ago by
brentp22k
Salt Lake City, UT
brentp22k wrote:

In python with cruzdb (available here: https://pypi.python.org/pypi/cruzdb):

from cruzdb import Genome
hg18 = Genome('hg18')
hg18.bin_query('refGene', 'chr1', '123456', '223456')
ADD COMMENTlink written 5.2 years ago by brentp22k
0
gravatar for Roman Valls Guimerà
11 months ago by
Melbourne
Roman Valls Guimerà510 wrote:

In today's (2018) cruzdb it seems to require a bit more work than that since one has to iterate/fetchall through the sql query (also give the intervals with integers instead of str):

In [9]: from cruzdb import Genome
 ...: hg18 = Genome('hg18')
 ...: hg18.bin_query('refGene', 'chr1', 123456, 223456)
 ...:
Out[9]: <sqlalchemy.orm.query.Query at 0x10b0297d0>

In [10]: from cruzdb import Genome
...: hg18 = Genome('hg18')
...: q = hg18.bin_query('refGene', 'chr1', 123456, 223456)
...:
...:

In [11]: q.statement.execute().fetchall()
Out[11]: [(585, 'NR_039983', 'chr1', '-', 124635L, 130429L, 130429L, 130429L, 3L, '124635,129652,129937,', '129559,129710,130429,', 0L, 'LOC729737', u'unk', u'unk', '-1,-1,-1,')]
ADD COMMENTlink written 11 months ago by Roman Valls Guimerà510
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1169 users visited in the last hour