Hi! I'm trying to do a differential expression analysis using breast cancer TCGA data.

Firstly, I split the breast cancer cohort into two groups based on the expression level in z-score of a particular gene I'm interested in, so the groups were the samples with z-score < -2 and all the others, giving two groups (n = 35 and n = 1000). Then, I used the rsem data to perform a differential expression analysis using DEseq2, but when do a PCA to compare the two groups using the 500 most differentially expressed genes, I don't get two clearly split groups.

Then, I used gprofiler2, and one of the significative results was "TF", and when I see the term name, it appears "Factor: LUMAN; motif: CYCAGCYYCY; match class: 1".

So, my questions are:

- Should I better compare a group with z-score < -2 versus another group with z-score > 2?
- What does it mean the result of gprofiler related to TF? I have tried to look for information but I think I don't understand it properly... I think that it says that genes regulated by this transcription factor are enriched, but I would like to have some confirmation.

Thanks in advance!

