I'm working on ADNI Alzheimer's disease gene expression dataset and I am trying to reduce the background noise of the data. I am trying to follow the same procedure of a related paper that suggests excluding probes with intensity values less than or equal to the median of all gene expression values in 100 or more samples. The dataset has more than 700 samples (patients) and more than 49000 probes.
My first question is what is exactly the intensity value of a probe? How should I calculate it? (Should I take the average of the intensity level of a probe across all 700 samples?)
My second question is what does "all gene expression values" mean?
Thank you in advance for your precious time.