I use trinity/Oases for de novo transcriptome assembly. In my pipeline, I remove exact duplicate reads(forward and reverse strands) because I believe that duplicates don't add any information to the assembly and it reduces the input size and thus expedites the downstream analysis. But is my assumption correct?
I am also confused about this because in the oases paper, the authors say "assemblies with longer k vlaues perform best on high expression genes, but poorly on low expression genes" (http://bioinformatics.oxfordjournals.org/content/early/2012/02/24/bioinformatics.bts094.short). But if we remove duplicates and thus only have unique set of reads, don't we lose the expression value?