Question

Gene transcription determination with RNA-seq data

0

Entering edit mode

2.2 years ago

chansik ▴ 10

I'm analyzing RNA-seq data with TMM normalization (EdgeR).

The normalized TMM values of genes vary from -5.38 to 13.0, as TMM is a log scale, anything less than 1 will be negative.

In this result, how would I determine which genes are expressed or not expressed?

Thanks.

RNA-seq TMM transcription • 613 views

ADD COMMENT • link updated 2.2 years ago by Michael 54k • written 2.2 years ago by chansik ▴ 10

score 1 · Answer 1 · 2022-02-07

1

Entering edit mode

2.2 years ago

Michael 54k

I am not sure if one can ever do an "on/off" analysis properly. There is not really a well-motivated threshold for a gene being expressed or not. You could look at background coverage, however, that is most likely very close to zero. Ad-hoc options:

you set an arbitrary threshold (e.g. 1) (worst option IMHO)
you set a threshold based on a quantile of the gene expression values (e.g. 5%-tile of the data, R-function: quantiles)
you set a threshold based on the background distribution (expression of randomly sampled intergenic regions) and use the value of the 95%-tile as threshold (ok option IMHO)
same as above, but you fit a negative binomial distribution to the intergenic sample data (R-package fitdistrplus) and then determine the threshold on the theoretical (e.g., 99%) quantiles of the resulting distribution (I did this once, the reviewers did not complain, even though they maybe should have). See Skern-Mauritzen et al. (2021), there resulting in Fig.1 and Supplementary Section 3.: Transcriptomic analysis for evaluation of annotation for the Methods.

I am open to discussion but think there is no single best way.

ADD COMMENT • link 2.2 years ago by Michael 54k

0

Entering edit mode

Thanks for sharing your thoughts, Michael.

Loads of things to consider for the begineer.

In the case of defining lowly or not expressed genes for downstream application (such as the construction of gRNA for CRISPR activation that enhances expression), do you think limiting TMM value below 1 is reasonable to extract those genes?

ADD REPLY • link 2.2 years ago by chansik ▴ 10

0

Entering edit mode

I think, having that concrete application of functional characterization in mind, you can choose a different way of prioritizing genes. As long as you have replicated data and different conditions to compare, a standard differential expression approach should fare better. You can for example look for genes that are significantly DE and have low baseline expression in at least one condition, but they should be above the threshold in at least one other condition, too. Also, you should carefully inspect each target for sufficient coverage in a coverage plot and validated transcript isoforms beforehand. And in general, we validate all target transcripts by 5'-3'-RACE-PCR because our transcript annotations are not perfect either. I should say, that I am not an expert in CRISPR because our organism of interest doesn't have a protocol for it. We can only do gene silencing by dsRNA injection, however, our selection procedure for genes to characterize is going to be similar irrespective of the functional tool.

ADD REPLY • link 2.2 years ago by Michael 54k