I have RNA-seq data for 20 samples with 2 condition and 2 sex (male, female : control, treatment). I am very new to RNA-seq analysis and am trying to find the DEGs using DESeq2. Since I want to have the normalization to be calculated based on all samples, I get the rld based on all and then I will use contrast to find the DEG for 1. tratment vs. control male 2. tratment vs. control female.
For the QC, the PCA plot separates the male and female but the control and treatment is not separated very well on the PC2 for male C1 (Figure 1)
Then I decided plot PCA only for male samples. again PC1 does not separate the control and treatment male samples but PC2 is kinda separating them. (Figure2)
My questions are:
What numbers on the PCA plot (x and y axis) decide the separation? I read in a website that samples that are at PC1>0 are outlier. is that true?
Can I just look at figure 1 and remove male c1 and continue DEG analysis with 9 samples? or I should definately plot figure 2 ?
If I need to consider Figure 2, can I rely on only the PC2 which separates control and treatment and continue the DEG analysis or I should remove C1, tr4, tr5 samples and then work on DEG analysis based on remaining 7 samples?