Question

Normalization for subset of data

0

Entering edit mode

3.4 years ago

asumani ▴ 70

Hi all,

I need to analyze a subset of publicly available data. There are multiple antibody isotypes of B cells in a single cell RNA seq data. I want to subset IgEs(test) and IgMs(control) for differential expression analysis. Now, should I do the normalization before or after subsetting? Does it even matter? Finally, if it matters how does it affect statistical analysis?

Best,

statistics scRNAseq • 1.2k views

ADD COMMENT • link 3.4 years ago by asumani ▴ 70

0

Entering edit mode

Difficult to answer without more context. Is tis a single experiment or pulled from different sources? Probably you should create a single count matrix for the relevant celltypes and then feed this into an appropriate statistical framework. That would mean normalize after subsetting. It matters for sure, especially when the composition and type of cells are very different in the full experiment.

ADD REPLY • link 3.4 years ago by ATpoint 81k

0

Entering edit mode

It is a single experiment. The normalized count matrix from the same experiment is already available. My plan is to subset from this existing count matrix.

Second, I can run separate pipeline for the subset of fastq files and obtain another count matrix. Normalize the subset and do further analysis.

I am confused if subsetting from already normalized matrix would be statistically acceptable. Or, should I preprocess raw data for the subset and then normalize?

ADD REPLY • link 3.4 years ago by asumani ▴ 70

0

Entering edit mode

Subsetting the existing one is probably ok but then you are limited to statisticql tests then directly use the norm. counts such as the Wilcox test. For finding markers that is probably ok.

ADD REPLY • link 3.4 years ago by ATpoint 81k