Hi all,
Determining which genes are expressed in an RNA-seq experiment is often a little arbitrary, but one method I've seen is to look at the average logCPM distribution of all genes in the data set (described here https://darwinawardwinner.github.io/resume/examples/Salomon/CD4/reports/RNA-seq/salmon_hg38.analysisSet_ensembl.85-exploration.html). When the average logCPM distribution distribution is bimodal (the typical case), one would choose a threshold in the "trough" between the peaks.
I've tried this method, and found that the distribution is unimodal (image at the end of the post, with arbitrary cutoff line of -1). Does anyone have a suggestion for what to do in this case?
Thanks.