Hi! I am working on the benchmarking of different metagenomic classifiers with in silico datasets, and looking for some metrics to compare their results from the standpoint of classification accuracy. I found this article https://www.sciencedirect.com/science/article/pii/S0092867419307755 where the authors suggest the use of precision-recall curves and AUPRC as comparison metrics. Currently I am trying to implement this, but I am not sure that it's even a good metric for this analysis. My curves seem all over the place. I am not sure if I should apply the thresholding to the reference dataset (gold standard), without that my curves look , but if I threshold the reference too, it just . I am not sure if my calculation is wrong or the method itself is unfit for this kind of comparison. I am lost how I should even calculate the are under curve from this...

Hi, I'm trying to construct a similar curve using Kraken2 and Bracken report files. I'm trying to use the sklearn.metrics.precision_recall_curve, however I cant figure out exacty what should be the input. I have gone through other articles too, but have trouble figuring out exacty what is the input data. How do I calculate the precision and recall values from the kraken2 report. The only fields I see in the kraken2 report are the percentage classified, the number of reads at clade level, number of reads at taxon level, the taxon ID, the taxa level and the name of the organism. Could you please share exactly what the input format and steps you followed to get this plots. Thanks in advance!

Hi!

Sorry for the slow answer. I am working on something very similar right now, so I looked into this a bit more and I am not sure the sklearn.metrics.precision_recall_curve() method is the way to go. It requires a prediction probability score and while the relative abundancy score would seem fitting for this, I don't think it should be used. The probability score is the threshold for each prediction in a machine learning model, showing how "sure" it is in the prediction. Meanwhile the relative abundance has nothing to do with probability (for non-ML-based classifiers), it just says: "Prevotella copri makes up 1/5 of this sample". It tells nothing about how probable that the classified reads/k-mers/markers/proteins etc. actually belong to Prevotella copri. Moreover, the probability scores for ML models are also independent, meanwhile the relative abundances add up to one. I am currently comparing the scikit method to my own implementation of the precision-recall curve, and they result in vastly different values.

About the technical stuff: I use the percentage values from the Kraken2 report file (I select only the species values in my case) and make a python dictionary out of it. I wrote a simple thresholding function that only keeps the keys in the dictionary if their value is above the threshold. I compare these remaining keys to the ground truth data, count true positives, false positives and false negatives and calculate the precision and recall from them. I loop trough different threshold values, save the precision-recall value pairs in every case and plot them with mathplotlib.

I hope this helps!