Annotating The Data With Chromosome Start Position To Get A Gene Name
3
1
Entering edit mode
12.5 years ago

I have chromosome and Chromosome start position for a given data. I want to annotate the data to get gene names. So far what I have done is used the ccdskgMap table to get the geneId,chromend using-(where mytable.chromStart>=ccdsKgMap.chromStart and mytable.chromStart<=ccdsKgMap.chromEnd).And then I used the bioCyclepathway table to get the original gene symbols. But it seems that I am going wrong here. Is there a way to know the transcriptional start from chromosome start? or can anyone suggest a better way to get gene names?

annotation ucsc • 4.5k views
ADD COMMENT
3
Entering edit mode
12.5 years ago

I think I understand what you are asking. Let's assume that you are using the refGene table as an example in place of "mytable". You could do something like:

select x.geneSymbol,r.chrom,r.cdsStart,r.cdsEnd,r.name2 
from knownGene k 
join kgXref x on x.kgID=k.name 
join refGene r on r.cdsStart between k.cdsStart and k.cdsEnd
     and r.chrom=k.chrom;

The refGene table here plays the role of the OP "mytable" in that it supplies a set of base coordinates. I used knownGene instead of ccds, but the ideas could be applied to ccds as well. I was simply trying to show how to do a select using "between" and including the chromosome name (which the OP doesn't mention).

ADD COMMENT
0
Entering edit mode

The refGene table does not have ChromStart or ChromEnd. I think the OP is trying to figure out CCDS coordinates/name from ChromStart and ChromEnd. Is there a table that links Chrom-coordinates to cdsStart and cdsEnd ?

ADD REPLY
1
Entering edit mode
12.5 years ago

Well I am not sure I understand your problem correctly. But if you have both the chromosome sequence and the gene sequences why not just Blast the genes against the chromosome? That will prevent any counting errors you might make when you do it otherwise.

ADD COMMENT
1
Entering edit mode
12.5 years ago
ff.cc.cc ★ 1.3k

Hi, if I understand well your issue: given a basepair position and a chromosome name you need to get the symbol of the containing gene.

working with ucsc I suggest something like:

SELECT hl.value FROM hg19 h
   left join hg19_known2locuslink hl on h.name = hl.name)
   where h.chrom like 'mychr'
         and ( ((h.txStart < mybp) and (h.txEnd > mybp))
               or (h.txEnd > (mybp-mygenerange)  and (h.txEnd < mybp))    
               or (h.txStart < (mybp+mygenerange) and (h.txStart > mybp)) 
             )  
   group by symbol

where

-mychr is your known chromosome name

-mybp is your known base position

-mygenerange is the amplitude of the inteval in which looking for the nearest gene

the returned hl.value is the NCBI gene ID

ADD COMMENT
0
Entering edit mode

Can you please elaborate the above command, hg19 is a database, but why it is used as a table. I tried this command but it gives error as table hg19.hg19 doesnot exist.

ADD REPLY

Login before adding your answer.

Traffic: 1807 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6