I have single-cell RNA sequencing data from similar tissue, one dataset collected with SmartSeq2 (full transcript length, no UMIs), and another dataset collected from 10X (3' end, with UMIs).
I am doing a standard log(x/n + 1) normalization for the 10X data. However, for the SmartSeq, I am unsure how to normalize the data. Should I correct for gene-length bias? When I try log(x/n +1) for SmartSeq2, I get significant differences in gene expression between 10X and Smart-Seq.
My goal is to integrate the 10X and Smart-Seq datasets and perform clustering. I'd like the two datasets to match as closely as possible before integration. I have a count matrix for each (rows are genes, columns are cells).
Basically, what is the recommended way to normalize SmartSeq2 expression data?