We are strongly interested in assembly a good transcriptome of reference for a non-model organism and build a local database. We have sequenced the same individual with Illumina (150 millions of pair-end reads) and PacBios IsoSeq v3 (2 SMRT cells, one for shorter transcripts, shorter than 5kb and other for longer transcripts, up to 5kb).
To process long-reads, I have followed the PacBio IsoSeq pipeline proposed in their Github repo (https://github.com/PacificBiosciences/IsoSeq). The final result was removing 70% of the long-reads. Is that normal?
Using this data, I have assembled the transcriptome using only short reads and another combining long- and short-reads. In the end, I have not found any difference... Approx. The same N50, the same number of transcripts assembled, the rate of misassemblies... Does anyone know if PacBio data does not worth for transcriptome de novo assembly?