Question: Annotation Of Strs Using Ucsc Tables
1
gravatar for Raony Guimarães
8.1 years ago by
Dublin / Ireland
Raony Guimarães1.1k wrote:

Hello Folks!

I have a file with some STRs generated from exome sequencing:

chr1 900683 900726 AAAT 4 10.5 -4,-4 1 1 0 -4:1 1

chr1 926889 926923 AAAAG 5 6.6 0,0 1 1 0 0:1 1

chr1 1112424 1112491 AC 2 33.5 0,0 1 1 0 0:1 1

chr1 1202260 1202285 AAAAT 5 5 0,0 1 1 0 0:1 1

chr1 1437233 1437278 AAAAC 5 9.2 -3,0 2 2 0 -3:1/0:1 1

chr1 1585271 1585316 AC 2 22.5 -2,-2 1 1 0 -2:1 1

chr1 1684347 1684375 AGG 3 9.3 0,3 5 5 0 0:4/3:1 1

chr1 1701408 1701454 AAAAAC 6 7.7 -6,-6 2 2 0 -6:2 1

chr1 1948272 1948311 AAAG 4 10.2 0,0 1 1 0 0:1 1

chr1 2189157 2189192 AAAT 4 8.8 0,0 1 1 0 0:1 1

chr1 2302649 2302680 AAC 3 10.3 0,0 1 1 0 0:1 1

chr1 2380938 2380975 AACC 4 9.2 0,0 1 1 0 0:1 1

And I was asked to annotate this with genes and exons for each line. But the problem is that I don't know which track and table I should use from UCSC.

My options are

-> Track: Refseq, Table: refgene

-> Track CCDS, Table: ccdsgene

-> Track UCSC genes, Table: Knowngene.

Which one I should use and why ? I developed a simple python script for that but I'm wondering, if there is a better way for doing this... :)

annotation • 1.6k views
ADD COMMENTlink modified 8.1 years ago by Mary11k • written 8.1 years ago by Raony Guimarães1.1k
1
gravatar for Mary
8.1 years ago by
Mary11k
Boston MA area
Mary11k wrote:

If it's human, I'd consider a GENCODE track. There might be more in there than in the other sets because the "biotypes" they are annotating may be broader. http://www.gencodegenes.org/gencode_biotypes.html

But if you have to pick one of your listed ones, I would use UCSC genes/knowngenes because it contains both RefSeq and CCDS. See the description page here: http://genome.ucsc.edu/cgi-bin/hgTrackUi?hgsid=276948273&c=chr5&g=knownGene

ADD COMMENTlink written 8.1 years ago by Mary11k
0
gravatar for ff.cc.cc
8.1 years ago by
ff.cc.cc1.3k
European Union
ff.cc.cc1.3k wrote:

You should use knowngene & knowntolocuslink tables with a query like this (SQL) :

SELECT hl.value, min(h.txStart), max(h.txEnd) FROM knowngene h
left join knowntolocuslink hl on h.name = hl.name)
where h.chrom like 'chr_xyz' and (
((h.txStart < bp) and (h.txEnd > bp))
or (h.txEnd > (bp-range)  and (h.txEnd < bp))
or (h.txStart < (bpStr+range) and (h.txStart > bp)) );

where: bp is a base position in chromosome chr_xyz and you want to look for genes around a certain range. hl.value is the geneID

ADD COMMENTlink written 8.1 years ago by ff.cc.cc1.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1902 users visited in the last hour