Entering edit mode
3.2 years ago
vinishavvenugopal
▴
30
Hi,
I'm a beginner in RNAseq analysis. I have been trying to find a particular isoform of a gene. I used STAR and cufflinks for mapping and assembly.
The blue ones in the picture are the isoforms present in the reference genome (hg_38) but the gray one is the isoform that I'm interested to analyze and it's not present in the reference genome. How do I detect this new isoform?
Hope I could get some help from the experts. Thank you!
Best, Vinisha
By detect do you mean that you want to quantify reads associated with that transcript? If so, I would recommend using a program such as Salmon to quantify transcripts. It builds an index using a fasta file of transcript sequences, so all you would need to do is concatenate the sequence of your new transcript to a fasta file containing transcripts of all genes in hg38.
My first objective is to detect if the particular isoform is present or not, I didn't want to quantify.
But I tried to detect and quantify this isoform using cufflink. At first, I gave the sequence of transcript1 and interested isoform(only the exons, excluded the intronic regions) as a fasta file (just two transcripts for trial, I didn't include the whole hg38 fasta file because I wanted only for one specific gene). Please find the attached result.
Output for the first transcript:
Output for the interested transcript/isoform :
The second one didn't show any reads on the last exon (Does that mean the isoform is not present?)
Whereas the first one did show many reads for the same sequence(last exon of interested isoform), why didn't this show up in the 2nd?
I'm sure it sounds a bit confusing. But I tried my best to make it sound clear. Thanks.
I am not sure if I can address the query. But from the images attached, it seems you are using outdated workflow (Tophat-cufflinks workflow) within galaxy. I would suggest to use HISAT-stringtie-ballgown or HISAT-featurecounts/htseq-limmavoom-limma workflow for detection of isoforms. There are several manuscripts comparing the efficiency of workflows. One such is here: https://pubmed.ncbi.nlm.nih.gov/29040385/. You can also use salmon/kallisto workflows.
I have a follow up question.
What can I infer with a splice junction depth of 1? If the goal is to find a novel transcript, then can I take into consideration of splice junction depth being 1?