Is it good or not to filter non-related samples before gene expression normalisation for samples from several groups (for example, A (control), B and C; B and C are a subtype of a disease) because I focus on B Vs A; C Vs A, B Vs C and C+B Vs A. With no filtering, it will be an issue especially when comparing B Vs C, resulting in none differentially expressed genes. Thank you.
In general, keeping all of the samples/groups together during normalization is preferred. The reason is that most current methods use one form of empirical Bayes or another and you tend to get more accurate background distributions then. But of course if whatever method you use doesn't do that then the answer might be different.