I'm working on a de novo transcriptome project and the genome has not been sequenced for our plant. In general, trinity is working pretty well, but we are seeing some spurious fusion genes (N terminus of gene A fused to C terminus of gene B). For the questions we are asking, having the correct full-length protein sequence is important.
I was wondering if we could get around this problem by first mapping short reads to homologous protein sequences from Arabidopsis and then feeding each of those subcollections into a contig builder? Is there any software that acts like blastx but works well on short Illumina reads?
Thanks
Hi, length of the reads and the divergence between your species and Arabidopsis will be very useful to know first.
Hi, my reads are about 90bp. unfortunately the divergence is pretty far. Our strategy may be to fully sequence one representative species from our clade and then use that as a reference for the other species. Thanks!