ROC Area-Under-Curve Explained
Some things may take years to have them figured out. It is when someone shows you a definition of some "basic" mathematical object, but does not say why is this defined this way and how should it be interpreted. Moreover, you won't find the answer to your "why" and "how" questions so easily either because they are "so simple" that noone cares to tell, or simply because noone cares. Some time passes and you forget your desire to find out the meaning and just get used to the definition.
For example, it took me some months after I first heard the definition of matrix multiplication to understand why was it defined precisely like that. Same with the notion of a "determinant". Same with pretty much any other university's first-year mathematical object. The problem is probably in the fact that many of our math courses are "definition-based", not "intuition-based", but anyway, that's not the subject of this post.
Today I've accidentally discovered the interpretation of yet another object that I've heard about more than a year before and had to wait for additional clues since then: the ROC area-under-curve statistic. For those who don't know what that is: it's a way to measure the goodness of a classification algorithm by plotting a certain curve and measuring the area under this curve. It is intuitively clear that the value will be close to 1 for a "good" algorithm and 0.5 for a "bad" one. Other than that, it was pretty impossible to find in the whole internet any useful hint as to what might that area actually mean.
I've found that hint today, in a rather unexpected place. I've enjoyed it so much that I wanted to share this simple piece of information with you:
The area under ROC curve specifies the probability that, when we draw one positive and one negative example at random, the decision function assigns a higher value to the positive than to the negative example.
And have you had similar revelations? If so, I invite you to share them in the comments, I'm pretty sure many will be interested to hear.