How to plot ROC curve using qRT-PCR data?
1
0
Entering edit mode
2.3 years ago

I want to plot the ROC curve using the fold change values from the qRT-PCR data. The samples were first tested using a gold standard assay and those values will be used as actual values whereas the gene expression values will be used as "predicted values". Can someone guide me how to make the data set and do ROC analysis?

gene • 1.2k views
1
Entering edit mode

Fold change can take an infinity of values, it is not the kind of "YES or NO" question" that ROC analysis are made for. ROC curves are a measure of performance of binary classifier prediction systems, so in your case, it would be useless unless you want to simply classify your genes between differentially expressed or not.

Why don't you just measure a correlation ?

0
Entering edit mode

Yes I want to check the discriminatory power of the genes to differentiate between the positive group and the negative group. Hence the ROC.

3
Entering edit mode
2.3 years ago
Mensur Dlakic ★ 23k

In simplest terms, ROC curve measures the quality of a binary classifier based on sorted predictions. The predictions can be on any scale, which means that your data can be used to make a ROC curve as is, or it can be scaled to a [0,1] range which is where most binary classifiers will predict their values.

I will post a short Python code below that will show what I mean. Let's say that your qRT-PCR fold changes are in the range [0,25] and that you have 25 gold standards measurements. I will simulate that by creating 25 random numbers in [0,25] range. These are the numbers I got (rounded to 2 decimal places), though yours will be different each time you run the code.

[12.42 16.92 24.03 19.01 18.19 11.82 5.78 4.09 24.32 15.09 21.28 18.16 11.03 14.01 7.8 13.1 20.78 21.51 7.25 19.3 15.62 9.59 4.8  23.62 21.56]


Let's say that for each of those 25 experiments you have an array of 0 or 1 numbers, where 1s mean true positive and 0s mean true negative. Again, I will make 25 random numbers, which for my experment look like these:

[0 0 1 1 0 0 0 0 1 0 1 1 0 1 1 0 1 0 1 1 0 0 1 1 0]


When you calculate the ROC-AUC score for these two arrays of numbers, it comes out as 0.6730769230769231. Now, you can scale the fold change numbers to be in [0,1] range if you wish, because binary classifiers would typically give you numbers like that. Scaled fold change looked like this in my case:

[0.41 0.63 0.99 0.74 0.7 0.38 0.08 0. 1. 0.54 0.85 0.7 0.34 0.49 0.18 0.45 0.82 0.86 0.16 0.75 0.57 0.27 0.03 0.97 0.86]


As I said before, the scale of measurements/predictions doesn't matter, as the ROC-AUC score for the scaled array of numbers vs. original classes still comes out as 0.6730769230769231

The only thing you need to do is make a column of fold change numberss, and next to each of those numbers enter either 0 or 1 which you should know because this is a gold standard set. That's all the data you need, and then you need something that can plot a ROC curve, for example the roc_curve function in sklearn.

import numpy as np
from sklearn.metrics import roc_auc_score
from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler(feature_range=(0, 1))
random_fold = np.random.uniform(low=0, high=25, size=25)
print(np.around(random_fold, 2))
random_class = np.random.randint(low=0, high=2, size=25)
print(random_class)
print(roc_auc_score(random_class, random_fold))
scaled_random_fold = scaler.fit_transform(random_fold.reshape(-1, 1))
print(np.around(scaled_random_fold, 2).flatten())
print(roc_auc_score(random_class, scaled_random_fold))

0
Entering edit mode

Hi Mensur,

Thank you for this detailed explanation with a very nice example. Can one use correlation coefficients for the second column instead of absolute 0 and 1 values? I ask this in case one wants to find a new biomarker using PCR data. To do this, if correlation is calculated using PCR data to a gold standard disease marker's expression values and entered next to each FC value? Can this be used to plot ROC curves then?

Thanks a lot!

0
Entering edit mode

If you are asking whether the ROC calculation will work with correlation values instead of [0, 1], the answer is yes. I don't know whether that will make a clear-cut distinction between biomarkers, though it sounds like it might.

0
Entering edit mode

Thanks a lot!