MLCyber 2013 : Special Session on Machine Learning Challenges in Cyber Security Applications at the 2013 International Conference on Machine Learning and Applications

posted by organizer: csymons || 9556 views || tracked by 12 users: [display]

MLCyber 2013 : Special Session on Machine Learning Challenges in Cyber Security Applications at the 2013 International Conference on Machine Learning and Applications

Link: http://cda.ornl.gov/symons/workshops/2013/mlcyber

When	Dec 4, 2013 - Dec 7, 2013
Where	Miami, Florida, USA
Submission Deadline	Aug 14, 2013
Notification Due	Sep 10, 2013
Final Version Due	Oct 10, 2013

Categories machine learning security artificial intelligence networking

Call For Papers

Call for Papers for a Special Session at ICMLA 2013 on

Machine Learning Challenges in Cyber Security Applications

Submission Deadline: August 14, 2013
Notification Due: September 10, 2013
Final Version Due: October 10, 2013

Conference Dates: Dec. 4 – Dec. 7, 2013
Conference Location: Miami, Florida, USA

Many cyber security applications, such as intrusion detection and misuse detection, are ideally suited to machine learning approaches, given the need for generalization and the inadequacy of human-coded rule sets in classifying previously unseen, harmful behaviors. It is an unfortunate reality that most cyber security defenses are reactive, such that they consistently fail to anticipate and struggle to keep up with new attack vectors. Machine learning techniques have the potential to offer disruptive, game-changing technologies that can put the onus back on the attacker to bypass a sophisticated learning system as opposed to a manually updated set of rules. However, applications of machine learning methods in the field of cyber security are confronted with many major challenges. Fortunately, many of these challenges are research areas in which modern machine learning research has recently made substantial progress and upon which a great deal of current effort is focused. Examples of relevant challenges include the following:

• Large-scale, high-dimensional learning
The amount of cyber security data is nearly unlimited, and one of the ways to move from post-attack rule generation to accurate generalization for attack classification is to simultaneously leverage this wealth of potential indicators and the large number of data points in this domain.
• Limited ground truth
Methods of semi-supervised and unsupervised learning are particularly relevant given the nearly unlimited availability of unlabeled data and the difficulty and expense of obtaining both real attack examples and verified normal behaviors in a given environment.
• Transfer learning
Ground truth data pulled from different, but related distributions is often more readily available than ground truth assessments from the actual data distribution that a system will be deployed to analyze.
• Privacy preserving methods
Facilitating data sharing is a major challenge given the sensitivity of most security-related data, but privacy-preserving data sharing is a promising way to facilitate the learning of models that can generalize to catch new behaviors.
• Multi-view, multi-modal, and multi-scale learning methods
Often data from hard-to-incorporate sources offers useful insight, but straightforward integration of information in a joint feature space can be problematic (e.g., network statistics, packet analysis features, and host-level features might present semi-complete views of an example, such that simply combining features can result in increasing the dimensionality of the feature space without increasing the number of labeled examples, thus exacerbating the curse of dimensionality).
• Class imbalance
Normal examples will constitute the large majority of samples, and cyber security analysts are already being overwhelmed by alerts they do not have time to investigate. Consequently, there is typically a strong need for near 100 percent accuracy in detecting attacks, coupled with a requirement for ultra-low false positive detection.

The goal of this special session is to explore advances in the above-mentioned areas as they pertain to cyber security, to expose the broader machine learning community to problems arising in cyber security applications, to promote the use of advanced machine learning techniques to solve these pressing issues in cyber security, and to encourage a larger subset of machine learning practitioners to engage in coordinated efforts to solve some of these real world problems.

We invite submission of papers pertaining to the above areas and related topics. Papers that involve well-designed evaluation studies of machine learning on recent, real-world cyber data are strongly encouraged.

Topics of interest include, but are not limited to, the following:
• Large-scale learning methods for security applications;
• Data fusion and data imputation techniques in cyber security;
• Privacy-preserving learning methods, e.g., that can utilize sensitive attack capture examples;
• Publicly available cyber security datasets amenable to machine learning research;
• Ultra-low false positive, high-accuracy machine learning methods;
• Semi-supervised or unsupervised learning in cyber security to deal with limited ground truth;
• Learning methods designed to minimize the burden placed on human analysts;
• Learning methods that are not based on the IID assumption;
• Transfer learning for cyber security applications;
• Multi-distribution feature selection;
• Multi-view or multi-modal learning in cyber security applications;
• Machine learning studies on cyber security application topic areas, such as network/host intrusion detection, malware detection, data loss prevention, critical infrastructure attack detection, alert correlation/fusion, and automated response/mitigation.

Submission Instructions:
Papers should be at most 6 pages in IEEE format. Further information and instructions can be found on the ICMLA site (http://www.icmla-conference.org/icmla13/submit.htm). ICMLA conference content will be submitted for inclusion into IEEE Xplore as well as other Abstracting and Indexing (A&I) databases.

Organizers:

Christopher T. Symons, Ph.D.
Computational Data Analytics Group
Oak Ridge National Laboratory, Oak Ridge, TN
Email: symonsct@ornl.gov

Erik M. Ferragut, Ph.D.
Cyberspace Sciences and Information Intelligence Research Group
Oak Ridge National Laboratory, Oak Ridge, TN
Email: ferragutem@ornl.gov

Justin M. Beaver, Ph.D.
Computational Data Analytics Group
Oak Ridge National Laboratory, Oak Ridge, TN
Email: beaverjm@ornl.gov

Program Committee:
• Mark A. Buckner (Oak Ridge National Laboratory)
• Varun Chandola (Oak Ridge National Laboratory)
• Justin E. Doak (Sandia National Laboratories)
• Thomas H. Morris (Mississippi State University)
• Xinming Ou (Kansas State University)
• Yubin Park (University of Texas, Austin)
• Roberto Perdisci (University of Georgia)
• Stacy J. Prowell (Oak Ridge National Laboratory)