Hi!

I performed a PCA on 6 chick samples (bulk RNAseq, 3 treated vs 3 untreated) and I found 2 samples (one per condition) that cluster far apart from the other so I thought that they were the outliers. I then remove them and I re-did the PCA on the 4 samples and one of them does not cluster together with the other control. So what am I supposed to do at this stage? is there any additional test that I can do to confirm/ check for the outliers?

(I am not sure how I can justify the fact that different untreated samples do not cluster together despite the animal model are kept under the same condition and tissues processed in the same way from the same person whereas the treated samples do cluster together)

thank you for the suggestion

Camilla

Can you include a picture of your PCA plot?

`I performed a PCA...`

- may I ask what steps you did to get to that stage? You did not provide much details.Apologies! here the code and below the image of the 1st PCA plot (not sure if you can see it):

Can you also include the results of

`summary(PCA)`

? This will give information on the proportion of variance explained by each PC.PC1 explains 54% of variance in your data, and separates your conditions, which is a good sign. However, PC2 explains 34% of variance and separates a sample of each type from the other two samples of the same type, which

maycause problems. When these samples were being collected, was there anything different about those two samples compared to the other ones, such as being collected a different day?What I do not understand is, if I remove those 2 samples and re-compute the PCA (same code as above but removing the column with those samples) I got this plot(not sure if you can see it) with one control one the top left side and the other one on the bottom left side (variance explained: PC1 87.78%; and PC2 7.66%). How should I interpret that? Did I miss something?

It's not even clear yet whether you should be removing those two samples, but in the above plot the conditions are separated on the x-axis (PC1), which explains almost 88% of the variance in your data, ten times more than the separation on the y-axis (PC2). This is roughly what you would expect, since you want the between condition variance to represent most of the variance in the data.