I am running STAR-solo on SMART-seq2 data. In one manual (https://github.com/alexdobin/STAR/blob/master/docs/STARsolo.md), the Exact mode was recommended plate-based sequencing, which is about deduplication. However, I am still confused whether to deduplicate the reads or not, since many discussion said deduplication is not necessary for RNA-seq data as it could not differentiate PCR or biological duplication.
I have compare them and found that they are quite different regarding the count matrix. Would you recommend me to analyze both Exact and NoDedup mode for downstream analysis to see any difference?
Try both and see if there's any difference in your downstream analysis and what's the correlation between the two approaches.
It's impossible to do anything about the PCR duplicates from the pre-amplification step, but the location-based deduplication might help (or, alternatively, be overly aggressive) for the post-tagmentation PCR.
My personal view: I think people should always try different approaches if computationally feasible (and if they have the time), and report their findings in the supplementary materials of the papers they write (e.g. "we tried X, Y, and Z and observed a very slight difference (Supp. Data 1), and we ultimately decided to proceed with X for the remainder of our analysis"). It might not be relevant to the conclusions of the paper (especially if it's a biology paper and RNAseq is only one method you're using as evidence of your findings) but it would help the bioinformatics field.