Question

Outliers detection methods for RNA-Seq data

0

Entering edit mode

7.7 years ago

edianfranklin • 0

I have a RNA-seq dataset with normalization in RPKM. The dataset have 1 gene per row with 4 different experiment condition. I need a detect de outliers values in this dataset.

I used de weka filter interquantil Range: A filter for detecting outliers and extreme values based on interquartile ranges. The filter skips the class attribute.

Outliers: Q3 + OF * IQR < x <= Q3 + EVF * IQR or Q1 - EVF * IQR <= x < Q1 - OF * IQR

Extreme values: x > Q3 + EVF * IQR or x < Q1 - EVF * IQR

My questions are:

-Exist other methods for outliers detection in this type of data ?

-I can continue to use this method for my data?

Any suggestions?

RNA-Seq outliers rpkm • 3.7k views

ADD COMMENT • link updated 7.6 years ago by Whoknows ▴ 960 • written 7.7 years ago by edianfranklin • 0

0

Entering edit mode

Is there a reason why you are using the outlier approach rather than doing standard differential gene expression?

ADD REPLY • link 7.7 years ago by igor 13k

0

Entering edit mode

I will clustering the dataset. In the graphic analysis show a some high values, that values affect the cluster algorithms like k-means.

ADD REPLY • link 7.7 years ago by edianfranklin • 0

0

Entering edit mode

You usually want to see exactly that...

ADD REPLY • link 7.7 years ago by Devon Ryan 104k

0

Entering edit mode

With the original data, I get some cluster with a low significance. I want detect and eliminate the outliers from my dataset, for improve the clustering algorithms results

ADD REPLY • link 7.7 years ago by edianfranklin • 0

1

Entering edit mode

That's not improving the results, it's fudging them.

ADD REPLY • link 7.7 years ago by Devon Ryan 104k

0

Entering edit mode

Why?? I think is significance for my clusters

ADD REPLY • link 7.7 years ago by edianfranklin • 0

1

Entering edit mode

If you start removing points willy nilly then you can get whatever significance you want.

ADD REPLY • link 7.7 years ago by Devon Ryan 104k

0

Entering edit mode

One suggestion : Instead of removing outliers, you could try to use a distance metric robust to outlier values. What comes to my mind is dist = 1-cor(x, y, method = spearman) but I must say that I never tested such a metric and I'm not 100% sure it is a good idea.

ADD REPLY • link 7.6 years ago by Carlo Yague 8.6k

score 0 · Answer 1 · 2016-09-19

Hi

You could get information about outlier value by scatter plot in R.

Try to plot RPKM in ggplot scatter plot and then it shows your outliers at your data; The good point of scatter plot is, it shows correlation among your samples and also values scope. You could just remove them but consider some issues, your threshold for RPKM is very important e.g. 0.0029 is a RPKM value and 220 is RPKM as well. My code for removing outlier above 8 and less than -8 for showing in scatterplot.

ggplot(dat,aes(S1,S2))+geom_point()+ylim(8,-8)+xlim(8,-8)+geom_smooth(method = "lm")