I'm working on a de novo transcriptome project and the genome has not been sequenced for our plant. In general, trinity is working pretty well, but we are seeing some spurious fusion genes (N terminus of gene A fused to C terminus of gene B). For the questions we are asking, having the correct full-length protein sequence is important.
I was wondering if we could get around this problem by first mapping short reads to homologous protein sequences from Arabidopsis and then feeding each of those subcollections into a contig builder? Is there any software that acts like blastx but works well on short Illumina reads?