Hi ( I'm not native in English so, be ready for some possible language flaws)
It seems that in the most of the cases, a considerable percentages of the transcriptomes gained from de novo transcriptome assembly have no blast hit (against NCBI nr database, for example),
May be some of them are created because of some sequencing or assembly errors but most of them must be novel genes or maybe valuable first-time discovered protein coding mRNAs (I guess)!
What is the best strategy to find out what are these hit-less transcripts.
I have heard that searching for having ORF ot CDS (e.g using transdecoder program or using ExPASy website is one way, but I am searching for new or better approaches for these huge data! How to classifying them and which characteristics of them are more important in a biological point of view?
Thank you for sharing your valuable experiences ;)