I am curious whether trimming of reads of RNA-Seq dataset affects the final outcome of gene expression or not? If so, then is there a reason behind it? Say, for example, low quality read trimming results in fewer number of genes. How is this justified?
low quality read trimming results in fewer number of genes.
That is a specific kind of trimming. How are you deciding on the
Q score trim threshold? While it is true that generally stretches of adapter sequences tend to get low base qualities, they should be handled easily by a read trimming program.
If you have data that is not of good quality (e.g. overloading of flowcell, bad libraries) then trimming it using quality as a criteria can indeed take out sample sequences and can affect the DE result.