Question

Annotation Of Strs Using Ucsc Tables

1

Entering edit mode

11.9 years ago

Raony Guimarães ★ 1.4k

Hello Folks!

I have a file with some STRs generated from exome sequencing:

chr1 900683 900726 AAAT 4 10.5 -4,-4 1 1 0 -4:1 1

chr1 926889 926923 AAAAG 5 6.6 0,0 1 1 0 0:1 1

chr1 1112424 1112491 AC 2 33.5 0,0 1 1 0 0:1 1

chr1 1202260 1202285 AAAAT 5 5 0,0 1 1 0 0:1 1

chr1 1437233 1437278 AAAAC 5 9.2 -3,0 2 2 0 -3:1/0:1 1

chr1 1585271 1585316 AC 2 22.5 -2,-2 1 1 0 -2:1 1

chr1 1684347 1684375 AGG 3 9.3 0,3 5 5 0 0:4/3:1 1

chr1 1701408 1701454 AAAAAC 6 7.7 -6,-6 2 2 0 -6:2 1

chr1 1948272 1948311 AAAG 4 10.2 0,0 1 1 0 0:1 1

chr1 2189157 2189192 AAAT 4 8.8 0,0 1 1 0 0:1 1

chr1 2302649 2302680 AAC 3 10.3 0,0 1 1 0 0:1 1

chr1 2380938 2380975 AACC 4 9.2 0,0 1 1 0 0:1 1

And I was asked to annotate this with genes and exons for each line. But the problem is that I don't know which track and table I should use from UCSC.

My options are

-> Track: Refseq, Table: refgene

-> Track CCDS, Table: ccdsgene

-> Track UCSC genes, Table: Knowngene.

Which one I should use and why ? I developed a simple python script for that but I'm wondering, if there is a better way for doing this... :)

annotation • 2.3k views

ADD COMMENT • link updated 11.9 years ago by Mary 11k • written 11.9 years ago by Raony Guimarães ★ 1.4k

score 1 · Answer 1 · 2012-06-06

If it's human, I'd consider a GENCODE track. There might be more in there than in the other sets because the "biotypes" they are annotating may be broader. http://www.gencodegenes.org/gencode_biotypes.html

But if you have to pick one of your listed ones, I would use UCSC genes/knowngenes because it contains both RefSeq and CCDS. See the description page here: http://genome.ucsc.edu/cgi-bin/hgTrackUi?hgsid=276948273&c=chr5&g=knownGene

score 0 · Answer 2 · 2012-06-06

You should use knowngene & knowntolocuslink tables with a query like this (SQL) :

SELECT hl.value, min(h.txStart), max(h.txEnd) FROM knowngene h
left join knowntolocuslink hl on h.name = hl.name)
where h.chrom like 'chr_xyz' and (
((h.txStart < bp) and (h.txEnd > bp))
or (h.txEnd > (bp-range)  and (h.txEnd < bp))
or (h.txStart < (bpStr+range) and (h.txStart > bp)) );

where: bp is a base position in chromosome chr_xyz and you want to look for genes around a certain range. hl.value is the geneID