Question

normalization methods for scRNA-seq and RNA-seq in specific cases of increased global transcription

0

Entering edit mode

4.2 years ago

Bogdan ★ 1.4k

Dear all,

after re-visiting some articles showing that C-MYC induces global changes in gene expression,

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3505597/pdf/nihms416894.pdf

https://www.sciencedaily.com/releases/2012/10/121025121841.htm

where they have used SPIKE-IN controls for NORMALIZATION, thought that I shall ask you for an advice please :

-- shall we have RNA-seq collected from developing systems (where we do expect a global increase in transcription between time0 and time1), would TMM and DEseq2 normalization methods be advised ?

-- the same question for scRNA-seq (shall we use a pseudo-bulk approach for differential expression that includes edgeR)..

many thanks,

bogdan

ps : 've posted the question also on BioC support website : https://support.bioconductor.org/p/127960/

RNA-Seq scRNA-seq • 1.2k views

ADD COMMENT • link 4.2 years ago by Bogdan ★ 1.4k

0

Entering edit mode

Hi Devon, thank you for your reply.

Beside the use of SPIKE-IN CONTROLS (that we could add to the experiments at a later time point), I believe that we could use also a set of ~ 1130 HOUSEKEEPING GENES from : http://www.housekeeping.unicamp.br/?download to compute the scaling factors ? Is there a minimal number of housekeeping genes that we shall use ? thank you !

ADD REPLY • link 4.2 years ago by Bogdan ★ 1.4k

0

Entering edit mode

Please use ADD COMMENT and leave the answer box for answers. That keeps the thread logically organized.

Beyond that there is no actual "guarantee" that "housekeeping genes" are appropriate. If a ~~gene~~ cell gets metabolically more active or develops into a state that requires massive reorganization of the cytoskeleton I would expect genes such as Actin or GAPDH to change. If you decide for any genes, I would make sure (maybe using similar published data) that you have strong evidence against differential expression for those genes.

ADD REPLY • link 4.2 years ago by ATpoint 82k

0

Entering edit mode

thank you, just re-adjusted the discussion flow (i'm still learning about biostars ;).

and yes, we could also use the SPIKE IN controls, if our collaborators are willing to re-do the experiments.

ADD REPLY • link 4.2 years ago by Bogdan ★ 1.4k

0

Entering edit mode

Hi Devon, thank you for your reply.

Beside the use of SPIKE-IN CONTROLS (that we could add to the experiments at a later time point), I believe that we could use also a set of ~ 1130 HOUSEKEEPING GENES from : http://www.housekeeping.unicamp.br/?download to compute the scaling factors ? Is there a minimal number of housekeeping genes that we shall use ? thank you !

ADD REPLY • link 4.2 years ago by Bogdan ★ 1.4k

0

Entering edit mode

Well the minimal number is 1 (equivalent to what's done for qPCR), though you'd want more for the sake of robustness. You might check the cMYC datasets to find house keeping genes that aren't regulated by cMYC, since I suspect that at least some of the 1130 house keeping genes aren't as stable as one would hope.

ADD REPLY • link 4.2 years ago by Devon Ryan 104k

score 0 · Answer 1 · 2020-01-31

TMM, RLE, and quantile normalization aren't appropriate on data for which there are expected unidirectional global shifts. In such cases you either need a spike-in or prior knowledge about genes that aren't changing (these are then used with TMM/RLE/etc. to compute scaling factors).

scRNA-seq isn't any different in this regard, except if you're primarily doing it for finding cell-types you can hope there's more going on than simply transcriptional amplification and just ignore spike-ins. Spike-ins are probably more accurate in scRNA-seq, since it's more likely that you actually know how many cells you have.