Question: Obtain Ensembl transcript ids that contain retained introns
0
gravatar for mmitra
2.4 years ago by
mmitra30
Los Angeles, United States
mmitra30 wrote:

Hello everyone, I have coordinates for a set of retained introns and I would like to obtain the Ensembl transcript Ids that contain these retained introns. Also, I would like to extract the information about the "type" of transcript (like known protein coding or known nonsense mediated decay).

I can do this manually by going to the UCSC browser and checking which Ensembl transcripts contain these retained introns and then from the Ensembl IDs, I can check the "type" of these transcripts from Ensembl website. But, I am wondering if there is a quick way to do this. This would be very useful if I have a long list of retained introns.

I would appreciate any help. Thanks so much.

ADD COMMENTlink modified 2.4 years ago by Chun-Jie Liu260 • written 2.4 years ago by mmitra30
1

Retained introns category is the transcript type for transcripts that retain intronic regions. So you will not get other transcript types (such as protein coding on NMD) for transcripts classified as retained introns.

ADD REPLYlink written 2.3 years ago by Denise - Open Targets4.9k

Thanks so much for your comment. I have looked into several Ensembl-annotated transcripts that contain retained introns (they showed up in our splicing analysis) and most of them are classified as "known retained introns" by Ensembl. But couple of them are also grouped as "known protein coding" or "known nonsense-mediated decay" or "known processed transcript".

Do you know if the transcripts that are classified as "known retained introns" are predicted to not undergo NMD and also not have the potential to be translated? Thanks for your help.

ADD REPLYlink written 2.3 years ago by mmitra30

Can you give me some examples please?

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by Denise - Open Targets4.9k

Sure. Here is one example from each subtype:

Example 1: Gene, ORMDL1; ENST00000458355; Chr2; coordinates of exon with retained intron (hg19), 190647147-190647849; subtype, protein coding

Example 2: Gene, SLC17A9; ENST00000488738; Chr20; coordinates of exon with retained intron (hg19), 61593975-61594721; subtype, known processed transcript

Example 3: Gene,PNISR; ENST00000478777; Chr6; coordinates of exon with retained intron (hg19), 99851704-99852578; subtype, known NMD

I took the Ensembl transcript that contains the exon with retained intron. It would be great to have your input. Thanks.

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by mmitra30
1

The genes will be protein coding (gene biotype) but they will have different transcripts, each of them with different biotypes including non-coding transcript biotypes. Usually (if not always) the retained intron category is manually annotated by HAVANA based on their guidelines and a few exceptions to the rule can cause the "discrepancies" that your splicing analysis has shown up.

For example 1, ENST00000458355 is not a retained intron, although it may seem like at first glance (check more examples like that in page 34 of the HAVANA guidelines). However, the entire retained intron seems to open and in-frame with its flanking coding exons, therefore it was annotated as coding. Moreover, the new exon is at the 5' end of the transcript, the annotators have added the flag "alternative 5' UTR", which can be seen on the VEGA browser. From the Ensembl browser, you can seamlessly jump to the VEGA counterpart. You may want to contact HAVANA as it seems an additional remark (flag) is missing i.e. retained intron first (page 39 of the guidelines).

I'd have thought that ENST00000488738 is as retained intron transcript, so you better check this directly with the HAVANA guys (try using the Gencode help email) so that they can explain why it's been annotated as processed transcript.

Finally, example 3: the retained intron creates a premature stop codon which is further than 50 nt away from a downstream splice junction. Check page 40 of the HAVANA guidelines for manual annotation.

ADD REPLYlink written 2.3 years ago by Denise - Open Targets4.9k
1

Thanks for all your suggestions and links. They are very helpful. I gave you three examples, but I have other transcripts in those three categories as well, so I am going to check them again. I am very interested in retained introns, so all the information you provided would help a lot.

ADD REPLYlink written 2.3 years ago by mmitra30
1
gravatar for Chun-Jie Liu
2.4 years ago by
Chun-Jie Liu260
US, Houston
Chun-Jie Liu260 wrote:

If you are familiar with R, I recommend you 'biomaRt', you can use intron position to find nearest transcript and get Ensembl transcript ID.

Hope it help!

ADD COMMENTlink written 2.4 years ago by Chun-Jie Liu260

Thanks so much for the suggestion. I will try that.

ADD REPLYlink written 2.4 years ago by mmitra30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 634 users visited in the last hour