Question: When exactly is a gene "expressed"?
gravatar for exin
18 months ago by
exin50 wrote:

I'm interested in comparing the genes that are expressed in several cell types in order to infer functionings of these cell types.

I use DESeq to find genes that are differentially expressed (DE). However, a gene doesn't always have to be DE to be involved in the functionings of a cell. So ideally I'd like to come up with a reasonable method to find genes that are "expressed".

Some previous members of my lab came up with the Quartile Expression method: a gene is considered expressed if its transcript count (normalised but not transformed) is in the upper quartile of all genes across all samples. A major issue: Some genes just have much more transcript counts, so they're always expressed (across all samples), and this skews the Q values.

Anyone can point me to a reference where other methods have been attempted? Any thoughts?

rna-seq gene expression • 501 views
ADD COMMENTlink modified 18 months ago by Charles Warden7.9k • written 18 months ago by exin50

Is this a question about actual bioinformatics methods, or is this a philosophical question about what it means to “express” a gene?

ADD REPLYlink written 18 months ago by Joe18k

Probably stats/ bioinformatics. Especially the part on dealing with genes with exceptionally large transcript counts across all samples.

ADD REPLYlink written 18 months ago by exin50
gravatar for Charles Warden
18 months ago by
Charles Warden7.9k
Duarte, CA
Charles Warden7.9k wrote:

If you define an FPKM value of 0.1 as a rough value for a gene expressed above background, that would probably indicate 60-70% of genes are expressed in your sample. I think this seems reasonable (as a rough guideline).

If you want to use a higher FPKM / expression value (such as FPKM > 1) to choose genes that are easier to validate (or otherwise reduce your gene list), that would be a valid point. However, that doesn't mean genes with lower expression (or even expression of FPKM = 0.95) aren't expressed. So, I think that is a slightly different question.

That said, I think upper-quartile is probably too stringent to define a gene as expressed (I think that would correspond to a FPKM value greater than 1).

ADD COMMENTlink written 18 months ago by Charles Warden7.9k

Thank you Charles!

ADD REPLYlink written 18 months ago by exin50
gravatar for Jean-Karim Heriche
18 months ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche23k wrote:

A gene is expressed if its DNA is transcribed into RNAs. So by definition if you detect RNAs for a gene then that gene is expressed. What expression level should be considered relevant for the biological question at hand may not be easy to define which is why people go for DE genes, that's at least statistically tractable if not always biologically relevant.

ADD COMMENTlink written 18 months ago by Jean-Karim Heriche23k
gravatar for kristoffer.vittingseerup
18 months ago by
European Union
kristoffer.vittingseerup3.4k wrote:

There is no right answer for this question. The combination of low sensitivity (sequencing is not exhaustive) and transcriptional noise means one cannot answer the question. Any solution to this problem will be an arbitrary cutoff.

Practical considerations:

  1. If a genomic feature have to few reads our statical methods cannot theoretically find any significant changes meaning they are not worth testing.
  2. A major reason why we typically filter out low expressed features is that testing "to many" features will result in the FDR correction being very hard and we potentially miss relevant targets.
  3. Personally I think quantile based approaches are kinda strange and prefer absolute cutoffs instead. I think is easier to interpret and also fits better with point (1).

Tool solutions:

  1. You can pre-filter using some arbitrary cutoffs/functions (e.g. edgeRs filterByExpression() function)
  2. You can after testing all your features weight the p-values by the expression with tools such as IHW (note a version of this approach is actually build into DESeq2)

Lastly since you clearly want to compare two conditions you are properly well of doing the DE analysis.

ADD COMMENTlink written 18 months ago by kristoffer.vittingseerup3.4k

Thank you for such a practical answer!

ADD REPLYlink written 18 months ago by exin50
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1515 users visited in the last hour