How to remove outliers using PCA in R?
0
1
Entering edit mode
4.6 years ago

Hi,

I detected several outliers among my samples by plotting PCA. But I don't know how to remove this samplesPCA plot The outlier samples is marked by the red circle.

Thanks

PCA R • 8.8k views
ADD COMMENT
0
Entering edit mode

You should explain how you generated your PCA plot (from which type of data ? ). Put your code. And a minimal reproducible example.

ADD REPLY
0
Entering edit mode

The data is a dataframe of RNAseq FPKM expression file, rows correspond to genes and columns to samples.

library("FactoMineR")
library("factoextra")
pca_data <- as.data.frame(t(RNAseq_data))
pca_data$group <- c(rep('GBM',100),rep('rGBM',100))
pca <- PCA(pca_data[,1:(ncol(pca_data)-1)], graph = F)
fviz_pca_ind(pca,
         geom.ind = "point", 
         col.ind = pca_data$group, 
         palette = c("#00AFBB", "#E7B800"),
         addEllipses = TRUE, 
         legend.title = "Groups"
)
ADD REPLY
0
Entering edit mode

My first question with such a plot is, what are these outlier samples? Is there a biological or technical explanation for this?

ADD REPLY
0
Entering edit mode

I downloaded this RNAseq data and just explore it. Considering the large samples, I think remove these 'outlier' samples is not a risk.

ADD REPLY
0
Entering edit mode

Are all samples from the same dataset ? Do you have metadata on this samples (sequencing kit ? type ? cell type ? sequencing plateform, etc...) IMO you see here a clear (non-biological) batch effect

ADD REPLY
0
Entering edit mode

Yes, all tumor samples are from the same dataset. The clinical data doesn't contains batch information. So I want to remove these samples directly.

ADD REPLY
2
Entering edit mode

I guess in the pca object you should have PC1 and PC2 (information used to plot). Use these to filter out the samples i.e. PC1 < -100

ADD REPLY
0
Entering edit mode

Thanks, I save this plot as PDF file (large size) and then zoom in to get the outlier samples. It sounds silly but it really works :-)

ADD REPLY
0
Entering edit mode

Hi, I am also facing the same issue, and by checking your suggested method I am finding the actual sample which have pc1 < -100 are outlier. Please can you share explanation what is the basis of the threshold selection of -100. It would be much helpful. Thank you.

ADD REPLY
0
Entering edit mode

ues fviz_pca_ind(pca_all, geom.ind = "text") to show sample names on the plot.

ADD REPLY

Login before adding your answer.

Traffic: 1125 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6