Question: Confused with Ensembl Exon ID, How to understand them?
0
gravatar for Yu
4.1 years ago by
Yu100
China
Yu100 wrote:

Hi, all

Recently, I am working with ensembl GTF annotation files, and try to detect the useful exon I wanted. 

I am confused with ensembl Exon ID. For example, the three exons (see below) are belongs to Gene ENSG00000000003 and have the same start site and end site .  

chrX    protein_coding  exon    99890555        99890743        .       -       .       gene_id "ENSG00000000003"; transcript_id "ENST00000373020"; exon_number "2"; gene_name "TSPAN6"; gene_biotype "protein_coding"; transcript_name "TSPAN6-001"; exon_id "ENSE00003662440";
chrX    processed_transcript    exon    99890555        99890743        .       -       .       gene_id "ENSG00000000003"; transcript_id "ENST00000496771"; exon_number "2"; gene_name "TSPAN6"; gene_biotype "protein_coding"; transcript_name "TSPAN6-003"; exon_id "ENSE00003512331";
chrX    processed_transcript    exon    99890555        99890743        .       -       .       gene_id "ENSG00000000003"; transcript_id "ENST00000494424"; exon_number "3"; gene_name "TSPAN6"; gene_biotype "protein_coding"; transcript_name "TSPAN6-002"; exon_id "ENSE00003512331";

My questions:

1. Why the first exon (ENSE00003662440 )and last two exons (ENSE00003512331) are annotated with different Exon ID? 

2. Could anybody explain the method of Exon ID annotation? (I don't find any document on ensembl site about the Exon annotation)

 

Thanks.

 

exon ensembl exon id • 2.8k views
ADD COMMENTlink modified 2.1 years ago by gandrescabrera80 • written 4.1 years ago by Yu100
5
gravatar for Neilfws
4.1 years ago by
Neilfws48k
Sydney, Australia
Neilfws48k wrote:

I'd guess part of the explanation is that the transcripts come from different sources and, despite what your GTF file states, have different biotypes.

Take a look at the region here. Transcripts TSPAN6-002 and TSPAN6-003 are Havana transcripts of type "processed transcript". TSPAN6-001 is an Ensembl/Havana merge transcript of type "known protein coding". So the 2 former exons are considered "the same exon"; the latter exon has the same coordinates but a different source so is considered a different exon.

There is some (not detailed) information about the annotation process here. Also note that your data appear to come from genome build GRCh37 and things are somewhat different in the latest build. There are now 5 transcripts of 3 types and consequently 3 exon IDs.

ADD COMMENTlink written 4.1 years ago by Neilfws48k

Thanks a lot! I think it is better to ignore the Exon ID when trying to find the same exon in multiple transcripts.

ADD REPLYlink written 4.1 years ago by Yu100
0
gravatar for gandrescabrera
2.1 years ago by
gandrescabrera80 wrote:

I'm also a bit confused about this GTF files. What does it mean "exon version"? What means the first "1" on every column?

1   havana  exon    11869   12227   .   +   .   gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000456328"; transcript_version "2"; exon_number "1"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; havana_gene "OTTHUMG00000000961"; havana_gene_version "2"; transcript_name "DDX11L1-002"; transcript_source "havana"; transcript_biotype "processed_transcript"; havana_transcript "OTTHUMT00000362751"; havana_transcript_version "1"; exon_id "ENSE00002234944"; exon_version "1"; tag "basic"; transcript_support_level "1";
1   havana  exon    12613   12721   .   +   .   gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000456328"; transcript_version "2"; exon_number "2"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; havana_gene "OTTHUMG00000000961"; havana_gene_version "2"; transcript_name "DDX11L1-002"; transcript_source "havana"; transcript_biotype "processed_transcript"; havana_transcript "OTTHUMT00000362751"; havana_transcript_version "1"; exon_id "ENSE00003582793"; exon_version "1"; tag "basic"; transcript_support_level "1";
1   havana  exon    13221   14409   .   +   .   gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000456328"; transcript_version "2"; exon_number "3"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; havana_gene "OTTHUMG00000000961"; havana_gene_version "2"; transcript_name "DDX11L1-002"; transcript_source "havana"; transcript_biotype "processed_transcript"; havana_transcript "OTTHUMT00000362751"; havana_transcript_version "1"; exon_id "ENSE00002312635"; exon_version "1"; tag "basic"; transcript_support_level "1";
1   havana  exon    12010   12057   .   +   .   gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000450305"; transcript_version "2"; exon_number "1"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; havana_gene "OTTHUMG00000000961"; havana_gene_version "2"; transcript_name "DDX11L1-001"; transcript_source "havana"; transcript_biotype "transcribed_unprocessed_pseudogene"; havana_transcript "OTTHUMT00000002844"; havana_transcript_version "2"; exon_id "ENSE00001948541"; exon_version "1"; tag "basic"; transcript_support_level "NA";
1   havana  exon    12179   12227   .   +   .   gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000450305"; transcript_version "2"; exon_number "2"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; havana_gene "OTTHUMG00000000961"; havana_gene_version "2"; transcript_name "DDX11L1-001"; transcript_source "havana"; transcript_biotype "transcribed_unprocessed_pseudogene"; havana_transcript "OTTHUMT00000002844"; havana_transcript_version "2"; exon_id "ENSE00001671638"; exon_version "2"; tag "basic"; transcript_support_level "NA";
1   havana  exon    12613   12697   .   +   .   gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000450305"; transcript_version "2"; exon_number "3"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; havana_gene "OTTHUMG00000000961"; havana_gene_version "2"; transcript_name "DDX11L1-001"; transcript_source "havana"; transcript_biotype "transcribed_unprocessed_pseudogene"; havana_transcript "OTTHUMT00000002844"; havana_transcript_version "2"; exon_id "ENSE00001758273"; exon_version "2"; tag "basic"; transcript_support_level "NA";

Thanks in advance!

ADD COMMENTlink modified 2.1 years ago • written 2.1 years ago by gandrescabrera80
1

first "1" on every column is name of the chromosome ( chromosome 1 ). exon_version: The stable identifier version for this exon.

you can find gtf format detail from: ftp://ftp.ensembl.org/pub/release-81/gtf/homo_sapiens/README

ADD REPLYlink written 2.1 years ago by Yu100
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 679 users visited in the last hour