A simple question about UCSC genome browser
2
0
Entering edit mode
14 months ago
dcheng1 • 0

This maybe a very naive question, but I'm trying to get the genomic coordinate of c-myc exon 1. I found there are so many different display of c-myc gene on UCSC genome browser, and they all have different location as exon 1. I'm confused about which one to choose. Could anyone explain what's difference? Thank you so much!

genome • 774 views
1
Entering edit mode

Hi,

The difference is related to the different transcripts of the same gene. if you hover over the exon, the name of the transcript to which it belongs will appears, probably starting with ENST.

0
Entering edit mode

Could anyone explain what's difference?

LIFE

1
Entering edit mode
14 months ago

One isoform can be selected as "canonical", based on experimental evidence collected into a database called APPRIS, or based on length or other criteria.

When you look at GFF3 files on the Gencode site, for instance, you may see some entries tagged with appris_* prefixes to denote the grade of evidence for "canonical-ity".

Other isoforms exist because genes can be transcribed in different ways. It can be useful to pick one isoform from all that are available for a gene, for the purposes of doing analyses.

In the UCSC browser, this isoform is perhaps experimentally-determined to be, say, expressed the most among all alternative transcripts, so it gets labeled with an inverted text label to give you a visual cue that this is canonical. The other labels are unadorned.

You might want to work with the canonical gene annotation, when doing your work. It can depend on your experiment.

Internally, UCSC keeps a table called knownCanonical that is used to label such isoforms. This table is available for direct inspection via Goldenpath for various assemblies, e.g. for hg38.

In hg38, as an example, the XIST gene has an isoform called ENST00000429829.6 which is labeled as canonical, and sits at chrX:73820655-73852723 (zero-indexed, which will be adjusted to one-indexed in the UCSC browser view).

You can grab the knownCanonical table and verify that this transcript is there:

% wget -qO- "http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/knownCanonical.txt.gz" | gunzip -c | grep ENST00000429829.6
chrX    73820655    73852723    28961   ENST00000429829.6   ENSG00000229807.12


If you do the same for the canonical-labeled MYC for hg38 or other assembly, you should see a similar result:

% wget -qO- "http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/knownCanonical.txt.gz" | gunzip -c | grep ENST00000621592.8
chr8    127736230   127742951   7390    ENST00000621592.8   ENSG00000136997.21

0
Entering edit mode
14 months ago

A more generic answer here is that the word "gene" is an abstract and theoretical concept - not a single, real thing.

What you see in the image are some of the physically existing things (in this case transcripts) that people refer to when they call something a "gene"