Find Out The Genes That Correspond To My Coordinates
5
4
Entering edit mode
11.8 years ago
e.karasmani ▴ 140

Dear All,

I have the following coordinates

 1         chr1 [  9933699,   9934385]   |  
 2         chr1 [ 88255056,  88257357]   |

How can I find out what genes are located next or in the aforementioned coordinates? I would like to get a refseq name and not the ensemble names such as ENSMUSG00000093178 or NM_00234

Could you please give me a guideline for that?

Thank you in advance

Best regards Lena

chip-seq exon intron peak-calling • 19k views
ADD COMMENT
0
Entering edit mode

thank you very much!

however is there a way by using R (since everything that I am doing is in R)....

i have my coordinates in IRanges or a data.frame (if this can help you)

thank you in advance

best regards Lena

ADD REPLY
6
Entering edit mode
11.8 years ago

Using the mysql server of the UCSC:

$ mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 -e '
select  distinct
    name,
   chrom,
   txStart,
   txEnd,
   IF(NOT(txEnd < 9933699 OR txStart > 9934385), 0, IF(txStart < 9934385,txStart-9934385,9933699-txEnd)) as distance
  from refGene where chrom="chr1" order by distance limit 20'
+--------------+-------+----------+----------+----------+
| name         | chrom | txStart  | txEnd    | distance |
+--------------+-------+----------+----------+----------+
| NM_001012329 | chr1  |  9908333 |  9970316 |        0 |
| NM_020248    | chr1  |  9908333 |  9970316 |        0 |
| NM_001009566 | chr1  |  9789078 |  9884550 |    49149 |
| NM_014944    | chr1  |  9789078 |  9884550 |    49149 |
| NM_032368    | chr1  |  9989775 | 10002826 |    55390 |
| NM_022787    | chr1  | 10003485 | 10045556 |    69100 |
| NM_052960    | chr1  | 10057254 | 10076078 |   122869 |
| NM_005026    | chr1  |  9711789 |  9789172 |   144527 |
| NM_001105562 | chr1  | 10093040 | 10241296 |   158655 |
| NM_006048    | chr1  | 10093040 | 10241296 |   158655 |
| NR_027045    | chr1  |  9712667 |  9714644 |   219055 |
| NM_001130924 | chr1  |  9648931 |  9674935 |   258764 |
| NM_001010866 | chr1  |  9648931 |  9665020 |   268679 |
| NM_032315    | chr1  |  9599527 |  9642831 |   290868 |
| NM_015074    | chr1  | 10270763 | 10441661 |   336378 |
| NM_183416    | chr1  | 10270763 | 10368655 |   336378 |
| NM_025106    | chr1  |  9352940 |  9429590 |   504109 |
| NM_002631    | chr1  | 10459084 | 10480201 |   524699 |
| NM_198544    | chr1  | 10490158 | 10512060 |   555773 |
| NM_199006    | chr1  | 10490158 | 10512060 |   555773 |
+--------------+-------+----------+----------+----------+
ADD COMMENT
0
Entering edit mode

Would it be possible to print the gene name TMEM201 instead of NM_001130924?

ADD REPLY
0
Entering edit mode

yes use name2 instead of name

ADD REPLY
3
Entering edit mode
11.8 years ago
Vikas Bansal ★ 2.4k

Use bedtools. Download refseq genes from UCSC. Then use bedtools. Have a look at closestBed and intersectBed.

EDIT: Firstly you have to make your input file (chr, coordinates) in bed file.

ADD COMMENT
1
Entering edit mode
11.8 years ago
Treylathe ▴ 950

A simple Table Browser search of these regions do the trick, unless you need something more robust and for larger sets of data (NM_ is the refseq as mentioned above)?

  • choose species and assembly
  • choose genes and gene prediction
  • choose refseq and ref gene
  • define regions above
  • output format: selected fields (choose at minimum gene name and alternative)

Gives a table delimited text file of gene names. For example, region above chr1:9933699-9934385 (assuming human, hg19) gives (cleaned for display purposes):

name            chrom    txStart    txEnd        name2
NM_020248       chr1     9908333    9970316      CTNNBIP1
NM_001012329    chr1     9908333    9970316      CTNNBIP1

You could use related tables to pull out other IDs and GO terms, etc.

ADD COMMENT
0
Entering edit mode
11.8 years ago

NM_002341 is a RefSeq accession number.

If you want to get a gene official name rather than an accession number, then (assuming these coordinates are on Homo sapiens), you could have a look at this.

ADD COMMENT
0
Entering edit mode
11.8 years ago
Ian 6.0k

A R specific method is the Bioconductor package ChIPpeakAnno.

ADD COMMENT

Login before adding your answer.

Traffic: 933 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6