I have a RNA-seq dataset with normalization in RPKM. The dataset have 1 gene per row with 4 different experiment condition. I need a detect de outliers values in this dataset.

I used de weka filter interquantil Range: A filter for detecting outliers and extreme values based on interquartile ranges. The filter skips the class attribute.

Outliers: Q3 + OF * IQR < x <= Q3 + EVF * IQR or Q1 - EVF * IQR <= x < Q1 - OF * IQR

Extreme values: x > Q3 + EVF * IQR or x < Q1 - EVF * IQR

My questions are:

-Exist other methods for outliers detection in this type of data ?

-I can continue to use this method for my data?

- Any suggestions?

Is there a reason why you are using the outlier approach rather than doing standard differential gene expression?

I will clustering the dataset. In the graphic analysis show a some high values, that values affect the cluster algorithms like k-means.

You usually want to see exactly that...

With the original data, I get some cluster with a low significance. I want detect and eliminate the outliers from my dataset, for improve the clustering algorithms results

That's not improving the results, it's fudging them.

Why?? I think is significance for my clusters

If you start removing points willy nilly then you can get whatever significance you want.

One suggestion : Instead of removing outliers, you could try to use a distance metric robust to outlier values. What comes to my mind is

`dist = 1-cor(x, y, method = spearman)`

but I must say that I never tested such a metric and I'm not 100% sure it is a good idea.