Given An Organism, Chromosome And Nucleotide Start And End...
2
2
Entering edit mode
13.1 years ago

Probably an easy question (because I think I'm doing it the hard way)...

Given a organism, chromosome number and nucleotide start and end coordinates, how do I find if there a gene name associated with that locus and, if there is, what is it?

What's the best way to do this from the command-line?

Thanks, -Rich

gene retrieval • 3.2k views
ADD COMMENT
2
Entering edit mode

Your anwser is here or here or ... etc...

ADD REPLY
2
Entering edit mode

I think his question is the other way around. He already has the position and wants to know what is there.

ADD REPLY
0
Entering edit mode

I think his question is the other way around. He alreday has the position and wants to know what is there.

ADD REPLY
0
Entering edit mode

@Chris, I agree. But the whole queries are here. Richard will just have to change one or two parameters...

ADD REPLY
3
Entering edit mode
13.1 years ago
Treylathe ▴ 950

A non-command line interface to do this would be the UCSC table browser.

Choose your genome assembly and then the known genes table ( or refseq or other).

Click "define regions" options and load your locations.

Get output as fields or bed.

That will give you a list of genes and info of those in your chosen regions.

Bed tools will give some more flexibility.

You could also export the above output to galaxy www.usegalaxy.org) and use "join" function with your regions again to get an output that will give a more informative output (you'll see which regions have genes associated and which don't)

ADD COMMENT
0
Entering edit mode

Thanks. I've been able to determine the gene name through the UCSC web site (the graphic is displayed which displays the gene name when you mouse over). I have to run this against thousands of results; I've considered parsing out HTML but a number of people say that's a bad plan. So, I've looked at the DAS routines but it's not clear to me how to get it to display (even in XML) the gene name. (I used type=RefGene) but there's no name in there.

ADD REPLY
2
Entering edit mode
13.1 years ago

You can use bedtools for this, if you have the annotation for your organism(s) in one of the supported formats e.g. BED, GFF/GTF.

perl -e 'print "chr1\t1000000\t1200000\n"' | intersectBed -a stdin -b annotation.bed
ADD COMMENT
0
Entering edit mode

I'm trying to understand how to create the annotation file. Could you point me to something? I've tried downloading various files from UCSC but it's clear how to turn them into BED files.

ADD REPLY
0
Entering edit mode

and just for the record, I'm using HG19, Feb 2009.

ADD REPLY
0
Entering edit mode

They do not have to be BED files since bedtools supports several formats. You can obtain GTF format annotation by from Ensembl by FTP. See http://www.ensembl.org/info/data/ftp/index.html

ADD REPLY

Login before adding your answer.

Traffic: 3790 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6