Question

Annotating The Data With Chromosome Start Position To Get A Gene Name

1

Entering edit mode

13.7 years ago

Anno_Research ▴ 10

I have chromosome and Chromosome start position for a given data. I want to annotate the data to get gene names. So far what I have done is used the ccdskgMap table to get the geneId,chromend using-(where mytable.chromStart>=ccdsKgMap.chromStart and mytable.chromStart<=ccdsKgMap.chromEnd).And then I used the bioCyclepathway table to get the original gene symbols. But it seems that I am going wrong here. Is there a way to know the transcriptional start from chromosome start? or can anyone suggest a better way to get gene names?

annotation ucsc • 4.9k views

ADD COMMENT • link updated 10.7 years ago by Biostar 20 • written 13.7 years ago by Anno_Research ▴ 10

score 3 · Answer 1 · 2011-10-14

I think I understand what you are asking. Let's assume that you are using the refGene table as an example in place of "mytable". You could do something like:

select x.geneSymbol,r.chrom,r.cdsStart,r.cdsEnd,r.name2 
from knownGene k 
join kgXref x on x.kgID=k.name 
join refGene r on r.cdsStart between k.cdsStart and k.cdsEnd
     and r.chrom=k.chrom;

The refGene table here plays the role of the OP "mytable" in that it supplies a set of base coordinates. I used knownGene instead of ccds, but the ideas could be applied to ccds as well. I was simply trying to show how to do a select using "between" and including the chromosome name (which the OP doesn't mention).

score 1 · Answer 2 · 2011-10-14

1

Entering edit mode

13.7 years ago

Chris Evelo 10k

Well I am not sure I understand your problem correctly. But if you have both the chromosome sequence and the gene sequences why not just Blast the genes against the chromosome? That will prevent any counting errors you might make when you do it otherwise.

ADD COMMENT • link 13.7 years ago by Chris Evelo 10k

score 1 · Answer 3 · 2011-10-14

Hi, if I understand well your issue: given a basepair position and a chromosome name you need to get the symbol of the containing gene.

working with ucsc I suggest something like:

SELECT hl.value FROM hg19 h
   left join hg19_known2locuslink hl on h.name = hl.name)
   where h.chrom like 'mychr'
         and ( ((h.txStart < mybp) and (h.txEnd > mybp))
               or (h.txEnd > (mybp-mygenerange)  and (h.txEnd < mybp))    
               or (h.txStart < (mybp+mygenerange) and (h.txStart > mybp)) 
             )  
   group by symbol

where

-mychr is your known chromosome name

-mybp is your known base position

-mygenerange is the amplitude of the inteval in which looking for the nearest gene

the returned hl.value is the NCBI gene ID