I have a single cell RNAseq dataset that is based off of three unique biological samples. For each of these samples, the number of reads/cell is highly variable (30x difference). My current understanding is that the best approach to normalizing this data so that the samples could be compared would be to subsample the reads from the higher reads/cell sample so that the depth is equivalent to the lower reads/cell. This approach loses a lot of data.
I could also apply a normalization factor to account for sequencing depth, but I am concerned that because of the significant difference between the reads/cells, the drop-out events in the lower reads/cell sample would lead to a biased comparison.
Is there an option where my comparison would not be biased or lose vast quantities of data?
This normalization is in preparation for clustering and DEG analysis.
Thank you for your input!