I have a non-model organism so no reference genome. Trying to identify differentially expressed genes contributing to muscle function (force over length extended). We performed RNASeq, then de novo assembled (Trinity) and also constructed a supertranscript to use as a reference for alignments.
We identified some rRNA contamination and removed those sequences but we are now seeing some rather disappointing alignment rates when using HISAT2. Our lowest being in the 60s majority in the high 70s %
We are using StringTie and then CuffDiff for differential expression. I know, I know, it's deprecated and everyone says not to use it. When I brought this up to my advisor he said "We aren't looking for minute differences, we're looking for large structural differences so the program we use to find those shouldn't matter too much since almost every program worth its salt should be able to pick up on those differences."
What are your thoughts, is he right? Or should I push to use a salmon/kallisto/sleuth pipeline? With these new HISAT2 alignments I'm feeling more apprehensive about this current pipeline.
Yes, we hope to publish.