I have this doubt and I have read different versions. However, most of them are from several years ago so, in order to obtain an up-to-date answer I come here to get fed by your wisdom.
When we are working with non-model organisms and we perform our assemblies. At the end we have a .fasta file with all our transcripts which corresponds either to true transcripts and artifracts/nonsense/bad transcripts.I know they are several options to prioritize "good >> bad", like Transrate "good transcript" subset or EvidentialGene ".okay (or even .okay + .alt) subset".
Well, lets suppose we have ran one of these filters and we have also checked the % of mapping, the BUSCO score and also de Transrate score, and we had our confidence .fasta file with our good transcript. What is the correct order to proceed at this point?
Run some "counting" tool like kallisto on our whole transcript file? Try some kind of annotation and then run our kallisto only over our transcript with some hit?
I have this doubt because, usually I found a % of annotation (being generous) of more or less 50% so i don't know if the other 50% are false transcript or just unknown transcripts.
For example, if my next step is to fed DESeq2 with these counts, I need to use tximport and I need to give to the program the tx2gene file which come from my annotation step. But if I have run kallisto over the whole unannotated transcriptome, then I found a lot of "lost" information. Could be better if I take out this unannotated transcripts and "force" kallisto to find a place for these reads in my "annotated" assembly?
Thank you for your time.