Anomaly or outlier detection is a popular topic in data mining and machine learning. In fact, it has been argued that many novel scientific discoveries have been triggered by the need to explain outliers. A working definition of outliers [due to Hawkins] is “An outlier is an observation which deviates so much from the other observations as to arouse suspicions that it was generated by a different mechanism.”
The aim of this workshop is to invite research ideas to translate the above definition into an optimization framework. There has been some progress in this endeavor. For example, the one-class SVM formulation for novelty detection has been successfully used, in some cases, to elicit outliers from data using optimization methods. In the theoretical computer science community, approximation algorithms have been proposed for the “facility location problem with outliers” and the “K-median problem with outliers.” A similar effort has been made within a branch of statistics known as “robust statistics.” Here the key motivation has been to design statistical estimators, which are robust against model mis-specification and outliers. Some of the problems in Robust Statistics (like minimum covariance determinant) have been cast in an optimization framework. However, there needs to be a more concerted effort within data mining to come up with canonical optimization problems for outlier detection. This would be one way to drive progress in this promising area, which is especially relevant in the age of “Big Data”.
|