We have paired-end Illumina RNASeq reads and we are working with a non-model organism with no reference genome. We have a working composite for a protein sequence that includes every exon we have found via cDNA. We have 6 muscle types with some triplicates and want to see how many times 4 specific exons that look to be alternatively spliced are present in each muscle type.
For example, muscle type a has this exon expressed 46% while muscle type b only expresses this exon 12% of the time.
I'm not looking for differential expression, only a number of how many times this exon is found within the muscle type's transcript file.
I've tired feeding HISAT2 BAM files into stringtie and also taking the GTF files from stringtie and putting them into htseq-count but neither worked.
I was already able to align the raw reads to the composite and visualize the alignment in IGV. However, there are thousands of raw reads aligning to the 4 exons of internet. So I was hoping that there would be a better way of quantifying the frequency than manually counting.
Do I have to annotate the composite so that it is easier to select what I am looking for and if so how do I do that.