Question: How to get known canonical transcript information from UCSC for a specific gencode version
0
gravatar for komal.rathi
7 days ago by
komal.rathi3.4k
Children's Hospital of Philadelphia, Philadelphia, PA
komal.rathi3.4k wrote:

Hi,

I am using UCSC genome browser to get known canonical transcripts using this link. This is the default Gencode version V29 and I am able to set the table to knownCanonical. However, when I change the gencode version to ALL Gencode V23 under track, the table options change and I can no longer access any tables corresponding to knownCanonical.

Does anybody know how I can get the canonical transcript info for gencode v23?

Thanks!

ADD COMMENTlink modified 3 days ago • written 7 days ago by komal.rathi3.4k

You should ask this over at UCSC Genome browser help desk. Someone from UCSC swings by here but they may not do so right away.

ADD REPLYlink modified 7 days ago • written 7 days ago by genomax67k
1

Thanks, I will do that. I will keep this open and post any responses I get from the help desk.

ADD REPLYlink written 7 days ago by komal.rathi3.4k
2
gravatar for komal.rathi
3 days ago by
komal.rathi3.4k
Children's Hospital of Philadelphia, Philadelphia, PA
komal.rathi3.4k wrote:

I got a response from the UCSC Genome Browser help desk which resolved my question:

The knownCanonical gene set is created from the longest transcript of the basic Gencode gene set. This convention was not around for the V23 gene set, so that file does not exist. If you would like to use a similar dataset without filtering for only the longest transcripts, you can use the Basic annotation set from Gencode V23 (http://genome.ucsc.edu/cgi-bin/hgTables?db=hg38&hgta_track=wgEncodeGencodeV23).

Alternately, you can filter out shorter transcripts, leaving the longest isoforms of each transcript by running a short script from the command line.

mysql -h genome-mysql.soe.ucsc.edu -u genome -Ne "select g.name, a.geneId, g.txEnd-g.txStart from wgEncodeGencodeBasicV23 g,
wgEncodeGencodeAttrsV23 a where g.name = a.transcriptId" hg38 | sort
-rnk 3 | awk '{if (!found[$2]) print ; found[$2] = 1}' | awk '{print $2}' > knownCanonicalV23.txt
  

The output of this script (knownCanonicalV23.txt) can be uploaded as identifier input in Table Browser. Using that file as Table Browser identifiers should allow output as if you were querying a knownCanonical data set from Gencode V23.

If you want to download a genePred file equivalent of knownCanonical for Gencode V23, you can run the following script on the command line.

mysql -h genome-mysql.soe.ucsc.edu -u genome -Ne "select   g.txEnd-g.txStart, a.geneId, g.* from wgEncodeGencodeBasicV23 g,
wgEncodeGencodeAttrsV23 a where g.name = a.transcriptId" hg38 | sort
-rn | awk '{if (!found[$2]) print ; found[$2] = 1}' | cut -f 4-  > knownCanonicalV23.gp
  
ADD COMMENTlink modified 3 days ago • written 3 days ago by komal.rathi3.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1654 users visited in the last hour