Outliers detection methods for RNA-Seq data
1
0
Entering edit mode
7.7 years ago

I have a RNA-seq dataset with normalization in RPKM. The dataset have 1 gene per row with 4 different experiment condition. I need a detect de outliers values in this dataset.

I used de weka filter interquantil Range: A filter for detecting outliers and extreme values based on interquartile ranges. The filter skips the class attribute.

Outliers: Q3 + OF * IQR < x <= Q3 + EVF * IQR or Q1 - EVF * IQR <= x < Q1 - OF * IQR

Extreme values: x > Q3 + EVF * IQR or x < Q1 - EVF * IQR

My questions are:

-Exist other methods for outliers detection in this type of data ?

-I can continue to use this method for my data?

  • Any suggestions?
RNA-Seq outliers rpkm • 3.7k views
ADD COMMENT
0
Entering edit mode

Is there a reason why you are using the outlier approach rather than doing standard differential gene expression?

ADD REPLY
0
Entering edit mode

I will clustering the dataset. In the graphic analysis show a some high values, that values affect the cluster algorithms like k-means.

ADD REPLY
0
Entering edit mode

You usually want to see exactly that...

ADD REPLY
0
Entering edit mode

With the original data, I get some cluster with a low significance. I want detect and eliminate the outliers from my dataset, for improve the clustering algorithms results

ADD REPLY
1
Entering edit mode

That's not improving the results, it's fudging them.

ADD REPLY
0
Entering edit mode

Why?? I think is significance for my clusters

ADD REPLY
1
Entering edit mode

If you start removing points willy nilly then you can get whatever significance you want.

ADD REPLY
0
Entering edit mode

One suggestion : Instead of removing outliers, you could try to use a distance metric robust to outlier values. What comes to my mind is dist = 1-cor(x, y, method = spearman) but I must say that I never tested such a metric and I'm not 100% sure it is a good idea.

ADD REPLY
0
Entering edit mode
7.6 years ago
Whoknows ▴ 960

Hi

You could get information about outlier value by scatter plot in R.

Try to plot RPKM in ggplot scatter plot and then it shows your outliers at your data; The good point of scatter plot is, it shows correlation among your samples and also values scope. You could just remove them but consider some issues, your threshold for RPKM is very important e.g. 0.0029 is a RPKM value and 220 is RPKM as well. My code for removing outlier above 8 and less than -8 for showing in scatterplot.

ggplot(dat,aes(S1,S2))+geom_point()+ylim(8,-8)+xlim(8,-8)+geom_smooth(method = "lm")
ADD COMMENT

Login before adding your answer.

Traffic: 1708 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6