Classify Genes As Expressed Or Not Expressed
Entering edit mode
9.5 years ago
predeus ★ 1.9k

Hello all,

this is probably a very obvious question, but I've never dealt with this sort of a problem, so I hope you all can point me in the right direction.

Imagine we have an array or annotated and quantified RNA-seq experiment. There are about ~24k genes, with normalized numerical expression value (or FPKM) assigned to them.

What is the most statistically sound way to automatically classify genes as "expressed" and "not expressed"? People often use empirical cutoff for this, e.g. FPKM of 1, but that's not what I'm interested in.

Thank you for any inputs.

gene-expression statistics classification • 2.2k views
Entering edit mode
9.5 years ago
xb ▴ 420

One simple approach is to standardize the log2 transformed expression values (within a sample for instance)

>0 for overexpressed; <0 for underexpressed; or use a cutoff other than zero where appropriate.

This is different from what you asked for - "expressed" and "not expressed".

However, the relative expression levels are more practical in my cases, and is easy to apply the downstream statistics, such as SAM ( ). It is applicable to both array or NGS data.


Login before adding your answer.

Traffic: 1229 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6