I am a beginner at this and genuinely am getting so confused. I have been using elbow plots to guess what I should do for my DIMS and it looks like 30 makes sense but just in case, I ran my data with 50 too and the cell predictions are drastically different. I kept the dims all the same including FindTransferAnchors and TransferData. Why is my data coming out so different?
A pictures can be worth a thousand words. Consider posting plots (you can remove any sensitive info) to get useful suggestions/answers.
well i'm not generating any plots. i run the data to find predicted cells then make a table of what cells are predicted and the counts. the thing is that the counts are drastically different when I change the dims and it makes me super unsure of what they should actually be because i would assume there is an upper and lower limit.
its a very thin layer explanation, just how your plot or generate the plot so that people can see and view and give their expert advice.
I'm not quite understanding what you've done so far, what you're trying to achieve, or how you are measuring success. But to address the question of how many dimensions from your PCA to use, a good place to start is by running
RunUMAP()
with a few different dimensions (as you've done). Pair this with clustering using a range of resolutions and see if your UMAP and cluster assignments broadly align with each other. Look for biological relevance in the clusters by runningFindAllMarkers()
. I would likely get a feel for the data this way before trying to transfer labels from another dataset - you'll then have some knowledge of the data to sanity check the cell type labels.