What do UCSC identifiers physically represent?
3
1
Entering edit mode
9.6 years ago
pwg46 ▴ 540

So, I am unsure as to what UCSC identifiers (E.g. uc001ppf.4) physically represent. It is very clear with, say Ensembl identifiers--i.e, ENST.... represents a transcript, ENSG.... represents a gene, ENSP... represents a protein, and it is also clear with RefSeq identifiers--NP_... represents a protein, etc... Can someone please clarify what UCSC identifiers refer to?

identifier ucsc uc ensembl • 3.0k views
ADD COMMENT
1
Entering edit mode
9.6 years ago
Ram 43k

A quick Google search tells me that UCSC has unique identifiers for each gene on its Genes track. I guess the IDs are global unique identifiers following a specific format, not necessarily dictated by molecule type or such detail (as opposed to NP_ or NM_ naming). The one part that might involve standalone logic is the number after the period. I checked, and it doesn't stand for transcript variant number. Might be versioning, like in NCBI GenBank.

ADD COMMENT
1
Entering edit mode

It should be noted that UCSC IDs are of questionable uniqueness. For example, a given ID can appear on multiple chromosomes and even different strands of the same chromosome. This is even true for gene IDs, which can be problematic (e.g., don't try to use a UCSC GTF file with DEXseq). This is one of the reasons many of us stick to Ensembl, it's more coherent.

ADD REPLY
0
Entering edit mode

I never knew UCSC exposed IDs. This answer was just the result of a quick Google search. Like you say, I'd much rather use a better understood and well-linked out ID than the UCSC one.

ADD REPLY
0
Entering edit mode
9.6 years ago

gene's isoforms

ADD COMMENT
0
Entering edit mode
9.6 years ago

UCSC IDs represent transcripts, with version numbers after the dot. They are grouped into clusters, which is somewhat similar to Ensembl genes.

IMHO Ensembl identifiers are not unique either if the transcript sequence is identical, so is the transcript Id, right?

For genes, the most general identifier is the official HGNC id for linkouts. For transcripts, a very stable set of identifiers is RefSeq.

ADD COMMENT

Login before adding your answer.

Traffic: 1943 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6