I did blastn of a transcriptome that was generated with Trinity against the assembly-level annotated genome of the bacterium I work with.
Out of the 11007 matches in the blastn results, 2474 are smaller than their respective CDSs from the annotated genome.
Would it be true to say that these 2474 sequences are of defective mRNA?
Specifically, some of such sequences are smaller then their respective CDSs in the annotated genome, and others are under 100 bases.
Is there an approach to filter such sequences properly, so that coding ones are kept and the rest are filtered out?