Question: Making a custom Transcriptome
15 days ago
rdmorris95

Dear All,

I am trying to quantify the expression of splice variants of a gene I am interested in. I would typically do this using Kallisto ( via the Galaxy server) and to do this I use a reference transcriptome. There is a splice variant which I am particularly interested in - its existence is very abundant in the literature, being recorded since the early 90s! However the variant is very poorly annotated and is not identified as a splice variant on Ensembl. Is their a way I could modify an existing Ensembl transcriptome ( or make a completely new transcriptome) which would include this variant.

Many Thanks


From what I remember when using kallisto last time, the reference transcriptome is a simple FASTA file with multiple sequences. If so, that would be a text file to which you can add your own sequence(s) using any editor.

Just add it as a sequence and add a special string that you're aware of to the header? (E.g. >mysequence.)

as others have pointed out, yes it is possible to add that to your custom transcriptome (don't forget to update indexes and such if you're done).

What species are we talking about here? if it is in Ensembl and the splice variant as you indicate is well known and described it seems a bit weird to me it is not included in the datasets that Ensembl offers.

Hi Ryan,

As lieven.sterck suggests, Ensembl would be very interested to revisit this annotation. Are you working with human data, or another species? In any case, please send further information about the missing transcript and the evidence to the Ensembl Helpdesk as we'd love to improve the annotation.

Best wishes


Hi Ben,

Alongside this post I have also emailed the Ensembl Helpdesk about this subject. I am working with human data and I am planning on quantifying the levels of neuronal Src variants in brain tumour cell lines and patient data.

In terms of the variants, their are two neuronal variants of Src ( N1 and N2) which have a mass of 542aa and 553aa. These variants arise due to an insertion of an microexon between exons 3 and 4. Because of this insert, N1-Src contains a six amino acid insert in the n-src loop of its Src homology 3 (SH3) domain, while in N2-Src, the N1 and N2 mini-exons insert a total of 17 amino acids:

For the Src entry (;g=ENSG00000197122;r=20:37344685-37406050) the 542 amino acids species (N1) is annotated ( SRC-202) but N2-Src has no annotation.

Best Wishes


Hi Ryan,

Thank you for your message to the helpdesk. I have forwarded your query to the GENCODE manual annotation team to investigate in more detail. They will respond to you directly.

Best wishes


