Question: ROC curve for biomarkers
0
gravatar for andrzej.urbanowicz
2.8 years ago by
United States
andrzej.urbanowicz0 wrote:

Hello,

For last several days I am trying to draw ROC Curve for my biomarkers study. Unfortunately I did not find any good explanation how it can be done for biomarkers. I would appreciate if you can give me some guidelines.

My current results are from Desq2 and I am not sure how to prepare data input for drawing ROC Curve and also which tool is best to draw it.

Thanks a lot in advance

-- Andrzej

rna-seq next-gen R • 2.0k views
ADD COMMENTlink modified 19 months ago by antgomo30 • written 2.8 years ago by andrzej.urbanowicz0
1
gravatar for ddiez
2.8 years ago by
ddiez1.8k
Japan
ddiez1.8k wrote:

For plotting I would recommend the ROCR package. See also this and this website. How to use it for biomarker study depends very much on what exactly are you doing.

ADD COMMENTlink written 2.8 years ago by ddiez1.8k

Thanks a lot for suggesting tools.

I am doing Differential Expression Analysis using Deseq2 between healthy and unhealthy groups. I have around 20 significantly express genes. I need to check which biomarkers (i.e 3 or more) will give me good AUC (i.e. need value around 0.98).

ADD REPLYlink written 2.8 years ago by andrzej.urbanowicz0
3

Just a tip, you probably want to be careful about how you train and validate (samples left aside to assess performance) your biomarker predictions. An AUC of .98 is generally very high performance for most tasks. Often if you evaluate the performance of your biomarker/predictions based on the same data you used to find them, it will substantially overfit your data and consequently won't generalize to new data (i.e. not reproducible).

ADD REPLYlink written 2.8 years ago by Collin680

Of course you are right. I thought about AUC between 0.8 - 0.9.

ADD REPLYlink written 2.8 years ago by andrzej.urbanowicz0

I am not an expert at all on ROC but my understanding is that it can be used to determine the performance of a classifier. In you description, it is not clear to me whether you are doing classification. Are you trying to find whether any of the DE genes can be used as biomarkers? That is, whether they can distinguish between healthy and disease? How do you define true positives? (Well, I guess this illustrates my ignorance on the topic).

ADD REPLYlink written 2.8 years ago by ddiez1.8k

Yes, you are right. My goal is to find whether any of DE genes can be used as a biomarker for the specific disease (exactly to discriminate between healthy and disease). I want to create prediction curves and to check what AUC value will give me the combination of 3, 5 or 7 chosen genes (how much it will improve) based on the ROC graph. True positive will be the case if the specific biomarker detects the disease (and it is really true in reality).

ADD REPLYlink written 2.8 years ago by andrzej.urbanowicz0
1

It might be that I misunderstood something what you are doing, but shouldn't you construct the model using one dataset (training set ) and use an independent dataset to evaluate the performance to create ROC?

ADD REPLYlink written 2.8 years ago by WouterDeCoster40k
0
gravatar for antgomo
19 months ago by
antgomo30
Spain
antgomo30 wrote:

So, your idea will be (correct me if I am wrong):

Imagine you have a set of 2000 DE genes from your DEseq2 analysis and you want o go iteratively generating subsets of 7-10 genes that enter in randomForest/SVM feature classification of samples, and the group/combinaion of genes which reach a AUC of 0.98, will be your signature. Isn't it?

ADD COMMENTlink written 19 months ago by antgomo30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1755 users visited in the last hour