Should I integrate only between fresh vs frozen?
If you think fresh and frozen as confounded factors then data integration based on fresh and frozen will attempt to remove differences between them and therefore will capture differences between the samples.
Should I integrate between fresh vs frozen and that the samples are
originated from different donors/patients?
The best approach will be to define each sample as a batch which generally produces the strongest batch correction. But, you can also add fresh vs frozen as batch too besides each sample as batch. I would evaluate them separately and make good judgement that makes biological sense. So, for example if you are using Harmony, you can do something like this-
harmonized_SO <- RunHarmony(SO,
group.by.vars = "Sample",
reduction = "pca", assay.use = "SCT", reduction.save = "harmony")
OR
harmonized_SO <- RunHarmony(SO,
group.by.vars = c("Sample", "fresh_vs_frozen"),
reduction = "pca", assay.use = "SCT", reduction.save = "harmony")
Should i better investigate fresh and frozen separately?
You can follow this approach too. Once again it depends upon what you want to achieve. I would recommend you to go through Data Integration approach that has been discussed very well in Single-cell best practices.
Thanks for this valuable answer. I did both of your suggestions.
on the left is the batch integration only focussing on Samples and on the right on Samples and fresh vs frozen. To me it still looks like a batch effect is there? I know that the fresh sample (S8) contains different celltypes than the frozen ones. So it could be that the correction is good. which one is better?