| |||||||||||||||
SDM HPA 2010 : SIAM Data Mining 2010 Workshop on High Performance Analytics | |||||||||||||||
Link: http://sites.google.com/site/workshophpa | |||||||||||||||
| |||||||||||||||
Call For Papers | |||||||||||||||
Important Dates:
* Submission deadline: January 15th 2010 * Notification of acceptance: February 1st 2010 * Final papers due: February 12th 2010 Workshop Co-chairs: * Amol Ghoting (Data Mining Systems Group, IBM T. J. Watson Research Center) o Contact information: aghoting (at) us.ibm.com, 914-945-2193 * Rong Yan (Facebook) o Contact information: yanrong (at) gmail.com * Xifeng Yan (University of California at Santa Barbara) o Contact information: xyan (at) cs.ucsb.edu, 805-893-3136 PC Members (Incomplete list): * Srinivasan Parthasarathy (Ohio State University) * Alexander Gray (Georgia Tech) * Yuan Yu (Microsoft Research) * Anthony Nguyen (Intel Research) * Philip Yu (University of Illinois at Chicago) * Edwin Pednault (IBM Research) * Jimeng Sun (IBM Research) * Tamara Kolda (Sandia National Laboratories) * Hong Tang (Yahoo!) * Jie Tang (Tsinghua University) * Vipin Kumar (University of Minnesota) * Jerry Zhao (Google) Objectives: With advances in data collection and storage technologies, large data sources have become ubiquitous. Today, organizations routinely collect terabytes of data on a daily basis with the intent of gleaning non-trivial insights on their business processes. To benefit from these advances, it is imperative that data mining and machine learning techniques scale to such proportions. Such scaling can be achieved through the design of new and faster algorithms and/or through the employment of parallelism. Furthermore, it is important to note that emerging and future processor architectures (like multi-cores) will rely on user-specified parallelism to provide any performance gains. Unfortunately, achieving such scaling is non-trivial and only a handful of research efforts in the data mining and machine learning communities have attempted to address these scales. At the other end of the spectrum, the past few years have witnessed the emergence of several platforms for the implementation and deployment of large scale analytics. Examples of such platforms include Hadoop (Apache) and Dryad (Microsoft). These platforms have been developed by the large scale distributed processing community and can not only simplify implementation but also support execution on the cloud making large scale machine learning and data mining both affordable and available to all. Today, there is a large gap between the data mining/machine learning and the large scale distributed processing communities. To make advances in large scale analytics it is imperative that both these communities work hand-in-hand. The objectives of the high performance analytics workshop are as follows. * Characterize the state of the high performance analytics arena * Promote algorithm design for high performance data mining/machine learning on the terabyte scale * Identify large scale data mining/machine learning problems by studying applications * Identify infrastructure/programming model requirements to implement large scale data mining/machine learning * Bring together researchers in high performance data mining/machine learning and large scale distributed data processing We expect the audience to learn the state-of-the-art in practical high performance machine learning/data mining. We would like to emphasize the word "practical" in that special attention will be given to realizing real-world solutions. Topics of Interest: * Application case studies that showcase the need for large scale machine learning/data mining in business, science, engineering, and other domains * Parallel and distributed algorithms for large scale machine learning/data mining * Exploiting modern and specialized hardware such as multi-core processors, GPUs, STI Cell processor, etc. * Memory hierarchy aware data mining/machine learning algorithms * Streaming data algorithms for machine learning and data mining * New platforms and/or programming model proposals for parallel/distributed machine learning and data mining for batch and/or stream domains * Evaluation of platforms (such as Hadoop) and/or programming models (such as map-reduce) for batch and/or stream domains Format and Duration: We anticipate having 1 keynote presentation followed by three sessions for research paper presentation. We intend to have 3 invited research paper presentations; one to start each session. For these invited presentations, we will solicit research papers from prominent groups in the high performance data mining/machine learning space. We expect to invite submissions from groups in Yahoo! Research, Google Research, Microsoft Research, IBM Research, Amazon, and Facebook. The remaining slots in the three sessions will consist of research presentations for accepted papers that will be reviewed by our program committee. Target Audience: We expect this workshop to attract researchers/practitioners from two large and distinct groups. The first group would consist of researchers interested in high performance data mining/machine learning. We believe this is a growing group that has consistently published at and attended workshops on high performance data mining like the HPDM series of workshops. These individuals will likely be SIAM Data Mining 2010 attendees as well. The second group would consist of researchers, developers from the large scale distributed processing arena and industrial practitioners that deal with massive data sets. This group would typically not attend a forum like SDM as their interests have traditionally not been well represented -- systems conferences and forums like the Hadoop summit are more aligned with their interests. One of the objectives of this workshop is to bring these two groups together and potentially attract a new audience to SDM. We expect the audience to learn the state-of-the-art in high performance machine learning/data mining. |
|