I've analysed bulk RNA seq data and performed typical differential expression analysis, from which I’ve defined the DEGs (DEGs_allgenes). I’ve divided this list considering what the genes are predicted to be (receptome, secretome and so on). In this way I got the following lists: DEG_allgenes, DEG_secretome, DEG_receptome. DEG_secretome and DEG_receptome are included in DEG_allgenes of course. I’ve performed pathways analysis (in R, gprofiler2, pvalue g_SCS analytical) with these 3 lists and what I got is that, considering p-values, not all DEG_secretome_pathways and DEG_receptome_pathways are included in DEG_allgenes. I tried to understand why, and my personal explanation is the following. Pvalues are affected by the number of DEG genes. So if I use longer DEG list (DEG_allgenes) could happen that some pathways disappear because the overlap between pathay_genes and DEG is more probable and could be due to casuality due to the fact that DEG_allgenes is longer.
Does it make sense?