Wondering if someone can provide me with some guidance.
I have previously sequenced 4 skin cancers using 10X chemistries and I would like to combine them into one dataset. My research question is to look at cancer stem cell populations, so I will need sensitivity.
I have done a load of reading and I am still confused as to what the correct method is.
CellRanger Aggregate causes a massive downsampling in 2 out of 4 samples. I believe this is becuase the number of cells in each sample varies (~600 - ~2500 cells). If I proceed with the aggregated dataset, I will be losing too much sequencing depth which will hinder my search for the cancer stem cells. Another option is to use CellRanger Aggr's 'normalize=none' argument, which does not normalize the average read depth per cell between samples before merging. From here, could I then go onto use scTransform or Seurat's 'NormalizeData' arguments? I have performed QC for each sample dataset and they all need different filtering parameters (nFeature_RNA), so the only problem with this method would be giving the combined dataset one filter, which could potentially include bad data for some of the samples.
Another option that I could do is filter and normalize each sample individually and then use Seurat's 'merge' function. However, by default, this uses the raw count matrices which will erase the normalized data matrices. There is an option to merge the normalized data matrices as well as the raw data matrices (merge.data = TRUE) but this should only be done if the same normalization approach was applied to all the objects. Does this mean the same normalization method such as 'LogNormalize' or 'CLR', or does it mean the same filters?
I am sorry if this is a disgruntled 'question', I am just looking for the best method to combine my 4 datasets without loosing too much data and then continue with downstream analyses.
Thank you so much in advance!!
Unfortunately, there is at this time no "best" method. People are still trying to figure out how best to do this.