Question: Statistical approach to identifying outliers in RNAseq data?
Hi All,

I am looking at a larger RNAseq dataset (144 samples) and I did some exploratory data analysis to look at clustering. From the PCA plot, it looks like there is a sample that clusters with a different group than expected (see the blue dot below among all the red dots). Based on this figure, I'd probably flag this as an outlier. I have four replicates for this sample, so losing one is probably fine if it improves the overall results.

What is the appropriate statistical approach to detect outliers in a more rigorous way?

Thanks for any insight!

enter image description here

