Monday, January 11, 2010

Precision-recall trade off ~ misclassification cost

While I was reading Pattern recognition and ML by Bishop, I realized a beautiful relationship between misclassification cost and precision recall. This might be well-known of course, but it's important for me as I myself realized it. :)

We know that precision-recall concept comes from Information retrieval kind-of a setting while misclassification cost comes from decision theory problems. In IR, given a set of true results i.e. T, and a set of retrieved results say R, precision is simply the fraction of true results in the set R. i.e. precision = #intersection(R,T) / #R. For example, suppose we know *all* the web pages that exist in the Web; then the set of pages which should be retrieved by a search engine are say T of size |T|. The search engine may retrieve a set R with size |R|. Then, if only 10 of these |R| pages are in T, then R's precision is 10 / |R|. Intuitively, R is more precise / exact if it has less not_required results. One may think of |R| >> |T| and that R contains all pages from T and many more extra results. Here, one says that R could "recall" everything in T but at the same time, it also made lots of mistakes. It is like, it took many trials on a puzzle and solved a puzzle 100 times correct while num_trials were 1000+. But in general if it could retrieve 10 pages which are in T, we say that it's recall is 10 / |T| i.e. out of |T| correct pages, it could retrieve 10. This also shows that there is a trade off between precision and recall because as num_trials increase, recall improves at the cost of precision.

Little off-topic but relevant: It's said that "practice makes man perfect". If we consider practice being making many trials, then as num_trials increase, due to practice one gets better results.

While I was reading the book, on page 41, I came across the misclassification cost involved in the decision theory. It says- We account for the misclassification cost because we don't want to miss on important predictions. Like in the example given for the two-class classification about whether a person is healthy or has cancer, it's important that we make no mistakes when the true class is "cancer-patient" as the consequences of these mistakes are dangerous. It also says that misclassification costs are such that we don't mind if a healthy person is predicted to be having cancer in order to not miss any patient which has cancer in reality. This is basically a trade-off between decisions. By this, we are looking for more recall even though we lose the precision.