pre-filtering expression data
Entering edit mode
6.2 years ago

Hello guys,

I have a the rna-seq normalized data as well as methylation data for a couple of hundered samples, for each sample there are a couple of hundred thousand features. However, before I do a feature selection, I need to pre-filter the features so that at least 90% of the useless features removed. What method is best for that? Are there any R script or package that does that?

differentially expressed data • 1.7k views
Entering edit mode

I think the most important question to you is: What is defined as a useless feature? Do you mean something that doesn't contribute to a treatment or condition? In that case, you can perform differential expression analysis between conditions to "preselect" those features. Or maybe you want to see if there is some relationship between your methylation and rna-seq data? Then maybe you setup a correlation matrix between all RNA Seq count and the methylation peaks (assuming it is ChIP-Seq?) then only look at features with high enough correlations (e.g. I am thinking of something similar to the eQTL analysis)


Entering edit mode

Aside from removing features that are not expressed at all (simple R commands to do that are easy to find), you can filter based on variance or median absolute deviation. For instance, the M3C package includes a function to do this, you can see section 5.2 of the package vignette (

Although it is relatively simple to write the commands yourself as well.


Login before adding your answer.

Traffic: 2583 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6