Question: Outliers detection methods for RNA-Seq data
0
gravatar for edianfranklin
3.1 years ago by
edianfranklin0 wrote:

I have a RNA-seq dataset with normalization in RPKM. The dataset have 1 gene per row with 4 different experiment condition. I need a detect de outliers values in this dataset.

I used de weka filter interquantil Range: A filter for detecting outliers and extreme values based on interquartile ranges. The filter skips the class attribute.

Outliers: Q3 + OF * IQR < x <= Q3 + EVF * IQR or Q1 - EVF * IQR <= x < Q1 - OF * IQR

Extreme values: x > Q3 + EVF * IQR or x < Q1 - EVF * IQR

My questions are:

-Exist other methods for outliers detection in this type of data ?

-I can continue to use this method for my data?

  • Any suggestions?
rna-seq outliers rpkm • 1.7k views
ADD COMMENTlink modified 2.9 years ago by Whoknows750 • written 3.1 years ago by edianfranklin0

Is there a reason why you are using the outlier approach rather than doing standard differential gene expression?

ADD REPLYlink written 3.1 years ago by igor8.1k

I will clustering the dataset. In the graphic analysis show a some high values, that values affect the cluster algorithms like k-means.

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by edianfranklin0

You usually want to see exactly that...

ADD REPLYlink written 3.1 years ago by Devon Ryan91k

With the original data, I get some cluster with a low significance. I want detect and eliminate the outliers from my dataset, for improve the clustering algorithms results

ADD REPLYlink written 3.1 years ago by edianfranklin0
1

That's not improving the results, it's fudging them.

ADD REPLYlink written 3.1 years ago by Devon Ryan91k

Why?? I think is significance for my clusters

ADD REPLYlink written 3.1 years ago by edianfranklin0
1

If you start removing points willy nilly then you can get whatever significance you want.

ADD REPLYlink written 3.1 years ago by Devon Ryan91k

One suggestion : Instead of removing outliers, you could try to use a distance metric robust to outlier values. What comes to my mind is dist = 1-cor(x, y, method = spearman) but I must say that I never tested such a metric and I'm not 100% sure it is a good idea.

ADD REPLYlink written 2.9 years ago by Carlo Yague4.6k
0
gravatar for Whoknows
2.9 years ago by
Whoknows750
Tehran,Iran
Whoknows750 wrote:

Hi

You could get information about outlier value by scatter plot in R.

Try to plot RPKM in ggplot scatter plot and then it shows your outliers at your data; The good point of scatter plot is, it shows correlation among your samples and also values scope. You could just remove them but consider some issues, your threshold for RPKM is very important e.g. 0.0029 is a RPKM value and 220 is RPKM as well. My code for removing outlier above 8 and less than -8 for showing in scatterplot.

ggplot(dat,aes(S1,S2))+geom_point()+ylim(8,-8)+xlim(8,-8)+geom_smooth(method = "lm")
ADD COMMENTlink written 2.9 years ago by Whoknows750
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 770 users visited in the last hour