Question: Making a custom Transcriptome
gravatar for rdmorris95
15 days ago by
rdmorris9510 wrote:

Dear All,

I am trying to quantify the expression of splice variants of a gene I am interested in. I would typically do this using Kallisto ( via the Galaxy server) and to do this I use a reference transcriptome. There is a splice variant which I am particularly interested in - its existence is very abundant in the literature, being recorded since the early 90s! However the variant is very poorly annotated and is not identified as a splice variant on Ensembl. Is their a way I could modify an existing Ensembl transcriptome ( or make a completely new transcriptome) which would include this variant.

Many Thanks


ADD COMMENTlink written 15 days ago by rdmorris9510

From what I remember when using kallisto last time, the reference transcriptome is a simple FASTA file with multiple sequences. If so, that would be a text file to which you can add your own sequence(s) using any editor.

ADD REPLYlink written 15 days ago by Mensur Dlakic8.4k

Just add it as a sequence and add a special string that you're aware of to the header? (E.g. >mysequence.)

ADD REPLYlink written 15 days ago by Dunois490

as others have pointed out, yes it is possible to add that to your custom transcriptome (don't forget to update indexes and such if you're done).

What species are we talking about here? if it is in Ensembl and the splice variant as you indicate is well known and described it seems a bit weird to me it is not included in the datasets that Ensembl offers.

ADD REPLYlink modified 15 days ago • written 15 days ago by lieven.sterck9.5k

Hi Ryan,

As lieven.sterck suggests, Ensembl would be very interested to revisit this annotation. Are you working with human data, or another species? In any case, please send further information about the missing transcript and the evidence to the Ensembl Helpdesk as we'd love to improve the annotation.

Best wishes


ADD REPLYlink written 15 days ago by Ben_Ensembl1.6k

Hi Ben,

Alongside this post I have also emailed the Ensembl Helpdesk about this subject. I am working with human data and I am planning on quantifying the levels of neuronal Src variants in brain tumour cell lines and patient data.

In terms of the variants, their are two neuronal variants of Src ( N1 and N2) which have a mass of 542aa and 553aa. These variants arise due to an insertion of an microexon between exons 3 and 4. Because of this insert, N1-Src contains a six amino acid insert in the n-src loop of its Src homology 3 (SH3) domain, while in N2-Src, the N1 and N2 mini-exons insert a total of 17 amino acids:

For the Src entry (;g=ENSG00000197122;r=20:37344685-37406050) the 542 amino acids species (N1) is annotated ( SRC-202) but N2-Src has no annotation.

Best Wishes


ADD REPLYlink written 15 days ago by rdmorris9510

Hi Ryan,

Thank you for your message to the helpdesk. I have forwarded your query to the GENCODE manual annotation team to investigate in more detail. They will respond to you directly.

Best wishes


ADD REPLYlink written 14 days ago by Ben_Ensembl1.6k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2686 users visited in the last hour