Question: Novel protein isoform - how to investigate tissue and tissue expression using databases?
Dear Community, I am a cell-biologist and not very familiar with bioinformatics... I have cloned a novel isoform/transcript variant of a protein and my advisor (who is not familiar with bio-informatics as well) asked me to investigate the species and tissue expression pattern of this isoform using database searches.... I have cloned this isoform from mouse kidney and obtained the sequence - but I have no idea how to proceed or how to generate "typical" graphs to show such information...

Regarding the existence/evolution of this isoform in other species: my initial thought was to download all the transcript variants of this gene in different species (human, mouse, drosophila, elegans) from NCBI's RefSeq DB and to align all those sequences to my sequence... I dont know, if this is a good way.... this may work/produce some useful results, but what I would really like to do is to present the evolutionary development of this sequence across the different species in some kind of phylogenetic tree... maybe someone could give me some hints or recommend a paper??? ... or do you think, that my idea is somehow "bad" or "unprofessional"???

Regarding the species-specific tissue expression: how to do such a thing???? Should I search UniGene and GEO and manually generate excel-data like this: "isoform X" (blast-related similarity to my isoform: 50%) is predominantly expressed in Drosophila melanogaster tissue X"???? To me, this sounds very "nooby" or "unprofessionell"....maybe someone could teach me how to manage such things in a more professional way?

I am really thankful for any hints and help!!! Thanks a lot - this is really important to me and I really urgently need professional help!!!!!!!!! Thanks!!!!!!!!!!

If it is novel, then there will not exist any explicit information on it in the public datasets. You already have the experimental data that says that it exists, which is great. I would aim to obtain public raw data (FASTQ) for your tissue and/or condition of interest and then see if you can find evidence of the existence of your novel transcript in such data via, for example, HISAT2 / StringTie, which can search for novel transcripts.

I cannot comment on the validity of your own approach / idea; however, I hope that somebody else can.

I fully support Kevin in that if it truly is novel it is not in any of the databases and you will have to reanalyze FASTQ files from scratch. But to determine if it truly is novel I would:

  1. Start looking for it in humans since the human transcriptome is the best annotated we have. Looking for large sequence similarity in Gencode and CHESS (Please note 50% sounds very low for an isoform though - typically isoform differences are much smaller than that! (aka you will get MANY hits)).
  2. I think your blast approach is a good idea to generally check if it exist in some database. As an extension, if it it a protein coding isoform you could try to use blastx which looks for similar features by translating your isoform to protein sequence. Please note that blast will not tell you anything about expression - just wether something similar have been annotated in other databases.

Once you know if it is annotated elsewhere you can choose (and get guidance to) how to proceed.

