DESeq2 uses independent filtering to filter out low read genes from analysis. Nevertheless it is customary to pre-filter ultra low read genes(usually rowsum(gene) > 10 read is the basic condition), this reduces load on DESeq2, but it also gives us fewer false positives because genes with low reads have a much higher rate of error for given p-values when only one group has reads. I'm sure you are all familiar with this.
I wasn't sure weather to apply this to salmon, but I did and it gives me better results. Now I am filtering salmon read counts with below 6 reads on average before putting them into DESeq2 for DGE
But I checked the probability distribution of the counts, both for STAR and SALMON aligned count tables it shows a massive amount of noise for counts all the way to 20 counts per average.
What is causing the spikes in density for very low counts(below 20) for both salmon and STAR? See in images attached.
Note that both gene counts are made from the same FASTQ raw data.
My personal guess is that it's a problem of sampling frequency, since reads are somewhat quantized, as you approach 0, the "bit size" becomes smaller, and you get oscillations, something like Nyquist limit. I haven't tried to verify this. But I have found there is always a spike of probability to find 0, 1 or 2 reads per gene.