Selecting top signature score from signatures with different distributions
1
0
Entering edit mode
8 weeks ago

I have generated some gene signatures of cell states from an single-cell experiment (signatures A-H)

I want to classify cells into one of the states A-H by selecting the appropriate highest score from the gene signatures. Each cell should have 1 state called or if the is no clear assignment then the call should be NA.

However, the signature scores of different states have different distributions so I don't think it would be appropriate to just choose the max score for each cell.

I have made the scores avaialble in long-format:

test.df <- read.csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vS5jLScYx4AaCiqRZwDqnqf41ozSzvLMLOWUU5VLT2FJ7XOBWbjJe_NLMOkK7-ndZ7m1LNFcD8ARB5L/pub?output=csv")

library(ggplot2)
library(pals)

kelly.cols <- kelly(22)[-c(1:2)]

ggplot(test.df,
       mapping = aes(x = value, fill = variable)) +
  geom_histogram(binwidth = 0.01) +
  theme_bw() +
  theme(panel.grid.minor.y = element_blank(),
        panel.grid.major.y = element_blank(),
        panel.grid.minor.x = element_line(linetype = "dotted"),
        panel.grid.major.x = element_line(linetype = "dotted")) +
  scale_fill_manual(values = kelly.cols[1:8])

distribution plot

I would like some advice on how to call the state based on the signature scores provided.

You can obtain the wide-format version of the data using:

test.mat <- matrix(test.df$value, nrow = nrow(test.df)/8, ncol = 8, byrow = FALSE,
                   dimnames = list(1:(nrow(test.df)/8),
                                   LETTERS[1:8]))
R classification distributions • 421 views
ADD COMMENT
1
Entering edit mode
8 weeks ago
LChart 4.3k

Not quite clear what you want. Are you attempting to classify cells into states? Or are you trying to adjust the scores so that they're on the same relative scales? Must a cell be in one of the states? Can a cell be in two or more states simultaneously?

ADD COMMENT
0
Entering edit mode

Apologies i thought that was clear from "I want to select 1 value per cell from the gene signatures to represent the cell state". I have amended the post.

I'm trying to classify cells into states. A cell can be in only one state, or have an NA.

ADD REPLY
0
Entering edit mode

This can be problematic if your scores are correlated. What are the spearman correlations between A-H?

There are a few ways to go about doing this. One of the simplest would be to set two thresholds: the minimum score (or rank) to make any call at all {a cell must have >this for at least one score), and a minimum gap between best and next-best to classify.

An alternative way would be to assign the top (say) 1% of cells from each class the corresponding label, and using a multi-label classifier to apply labels and probabilities, and then set a probability threshold.

ADD REPLY
0
Entering edit mode

Thanks for that. What I've already done seems to fit your initial suggestion:

I calculated the gap score

x <- sort(x, decreasing = TRUE)
x.gap <- x[1] - x[2]

And set a minimum threshold of 0.05

And then any calls that have a score < 0 have been set to NA also.

Do you have any code you could provide for the multi-label approach you mentioned? I don't have any experience with making classifiers

ADD REPLY
1
Entering edit mode

You'd basically set up a dataframe that had columns score_A score_B ... score_H classification and populate it with all of your data (the class for the top 1% or whatever, and NA elsewhere), and do something like

model <- e1071::svm(formula=classification ~ ., data=subset(yourdata, ! is.na(classification)), probability=T)
preds <- predict(model, yourdata, probability=T)

to get the predictions & probabilities. See https://www.rdocumentation.org/packages/e1071/versions/1.7-14/topics/svm for details - obviously you can use any multi-class classifier, but svm is as fine a place as any to strrt.

ADD REPLY
0
Entering edit mode

Cheers will give it a go.

ADD REPLY

Login before adding your answer.

Traffic: 1165 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6