posted by user: alanmoraes || 2323 views || tracked by 5 users: [display]

MSR 2010 : 7th IEEE Working Conference on Mining Software Repositories (MSR Challenge Track)

FacebookTwitterLinkedInGoogle

Link: http://msr.uwaterloo.ca/msr2010/challenge/
 
When May 2, 2010 - May 3, 2010
Where Cape Town, South Africa
Submission Deadline Feb 6, 2010
Notification Due Feb 20, 2010
Final Version Due Apr 5, 2010
Categories    software engineering
 

Call For Papers

Overview
Since 2006 the IEEE Working Conference on Mining Software Repositories (MSR) has hosted a mining challenge. The MSR Mining Challenge brings together researchers and practitioners who are interested in applying, comparing, and challenging their mining tools and approaches on software repositories for open source projects. Unlike previous years that have examined a single project, multiple projects in isolation, or a single distribution of projects (GNOME). This year the MSR challenge involves examining FreeBSD? operating system and distribution, the GNOME Desktop Suite of projects, and the Debian/Ubuntu Distribution Database. The emphasis this year is on how the projects are inter-related, how they interact and possibly how they evolve and function within a larger software ecosystem. There will be two challenge tracks: #1: general and #2: prediction. The winner of each track will be given the MSR 2010 Challenge Award.
Challenge #1: General
In this category you can demonstrate the usefulness of your mining tools. The main task will be to find interesting insights by analyzing the software repositories of the projects within FreeBSD, GNOME Desktop Suite and the package related meta-data of the Debian/Ubuntu Distribution Database.

FreeBSD is a BSD license BSD Unix distribution. It includes packages for desktop, server and embedded uses. FreeBSD also takes responsibility for porting many programs to its distribution via FreeBSD-ports.

GNOME Desktop Suite of projects. GNOME is very mature, and composed of a number of individual projects (nautilus, epiphany, evolution, etc.) and provides lots of input for mining tools.

The Ultimate Debian Database (UDD) is a database of packages, package dependencies and related bugs. It describes the Debian and Ubuntu distributions.

One could examine multiple projects within these ecosystems. For instance, examining API usage across all projects, training a predictive model on one project and assessing its accuracy on another, or examining how developers' activity spans multiple projects.

Participation is straightforward:

1. Select your mining area (one of bug analysis, change analysis, architecture and design, process analysis, team structure, etc.).
2. Get project data for multiple GNOME projects, FreeBSD? or the UDD
3. Formulate your mining questions.
4. Use your mining tool(s) to answer them.
5. Write up and submit your 4-page challenge report.
* Within the report you should clearly summarize what your contribution is, including what you found and its importance.

The challenge report should describe the results of your work and cover the following aspects: questions addressed, input data, approach and tools used, derived results and interpretation of them, and conclusions. Keep in mind that the report will be evaluated by a jury. Make sure your contributions, purpose, scope, results and importance or relevance of your work is highlighted within your report. Reports must be at most 4 pages long and in the ICSE format (http://www.sbs.co.za/ICSE2010/4-SUBMISSIONS/ICSE2010_SUB-FORMAT.html).

The submission will be via Easychair (http://www.easychair.org/conferences/?conf=msrchallenge2010). Each report will undergo a thorough review, and accepted challenge reports will be published as part of the MSR 2010 proceedings. Authors of selected papers will be invited to give a presentation at the MSR conference in the MSR Challenge track.

Feel free to use any data source for the Mining Challenge. For your convenience, we provide repository logs, mirrored repositories, bugzilla database dumps, and various other forms of data linked at the bottom.
Challenge #2: Predict

This year, the MSR Mining Challenge prediction will involve predicting the the final bug number within Debian on April 30th, 2010. We want you to predict the newest bug number to appear on April 30th.

Participation is as follows:

* Pick a team name, e.g., WICKED WARTHOGS, BAD BIRDS, etc.
* Come up with predictions for the final Debian bug report number as of April 30th based on some criteria or prediction model. A very simple model, for instance, would be the amount of growth in the past three months.
o Predict the final bug number of Debian at the end of the day on April 30th, 2010 (according to their server time (UTC))
+ e.g. as of Date: Tue, 13 Oct 2009 23:21:01 UTC there were 550906 was the largest bug number:
# http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=550906
* Write a paragraph (max 200 words) that describes how you computed your predictions.
* Submit everything before Feb 20th (Apia time) by email to msr2010predictions@challenge.softwareprocess.es

Prediction submissions will be scored by their distance from the last bug number that occurs on April 30th 2010.
Frequently Asked Questions

* Do I need to give a presentation at the MSR conference? For challenge #1, the jury will select finalists that are expected to give a short presentation at the conference. Then the audience will select a winner. For challenge #2, there is no presentation at the conference. The winners will be determined with statistical methods and announced at the conference.
* Does the challenge report have to be four pages? No, of course you can submit less than four pages. The page limit was set to ease the presentation of space-intensive results such as visualizations.
* Wow, the data set is soooo big! My tool won't finish in time. What can I do? Just run your tool on a subset of the projects. For instance, you could examine only the nautilus file manager and the epiphany web browser. Especially when you are doing visualizations, it is almost impossible to show everything.
* My cat is a visionary...can I submit its predictions or is the challenge #2 only for tools? Of course, go ahead and submit its predictions as a benchmark. However, your cat will run out of competition—only predictions generated by tools or by humans in a systematic way are eligible to win challenge #2.
* For the challenge #2-prediction, is it acceptable if our team submit more than prediction? Only one submission from a team (person) is allowed.

Important Dates
Submission of reports: February 6th, 2010
Submission of predictions: February 20th, 2010
Author notification: February 20, 2010
Camera-ready copy: April 5, 2010
Conference dates: May 2nd-3rd, 2010
Data

* Directory of the FreeBSD and UDD data:
o http://swag.uwaterloo.ca/~ahindle/challenge2010/
* The GNOME Data: http://msr.uwaterloo.ca/msr2009/challenge/msrchallengedata.html
* UDD - Ultimate Debian Database
o http://wiki.debian.org/UltimateDebianDatabase/udd.debian.net Scheme and description
o http://udd.debian.org/ main site
o More schema http://udd.debian.org/schema/
o Our official mirror of the UDD: http://challenge.softwareprocess.es/20090913.udd.sql.gz
o Their version 800mb file http://udd.debian.org/udd.sql.gz
* Extracted Repos
o http://challenge.softwareprocess.es/freebsd-svn-20090916.tar.bz2 (full freebsd svn mirror)
o http://challenge.softwareprocess.es/freebsd-mail-archive-20090913.tar-split/ (mail archive of freebsd project)
o http://challenge.softwareprocess.es/freebsd-20090913.tar.bz2-split/ (freebsd everything except extracted data)
o http://challenge.softwareprocess.es/freebsd-gnats-20090913.tar.bz2 (freebsd bug emails)
* Analyzed Repositories
o Freebsd indentation analyzed + mccabe and halstead per diff chunk
+ http://challenge.softwareprocess.es/freebsd-indentation-metrics-20090913.tar.bz2
o http://challenge.softwareprocess.es/freebsd-20090913-cvsanaly-mysql-dump.txt.bz2 (freebsd CVSAnalY dump)
o http://challenge.softwareprocess.es/ports-20090913-cvsanaly-mysql-dump.txt.bz2 (freebsd ports CVSAnalY dump)
o http://swag.uwaterloo.ca/~ahindle/challenge2010/freebsd_bugs.sql.bz2 The FreeBSD bugo repo parsed by Israel Herraiz
o http://www.emad.softwareprocess.es/freebsd_crex_output.bz2 The FreeBSD CVS repo parsed with CREX by Emad Shihab bzip2 compressed (20MB)
o http://www.emad.softwareprocess.es/freebsd_crex_output.bz2 The FreeBSD CVS repo parsed with CREX by Emad Shihab in RAR Format (20MB)
* The files
o http://challenge.softwareprocess.es/freebsd-20090913-cvsanaly-mysql-dump.txt.bz2 (freebsd CVSAnalY dump)
o http://challenge.softwareprocess.es/ports-20090913-cvsanaly-mysql-dump.txt.bz2 (freebsd ports CVSAnalY dump)
o http://challenge.softwareprocess.es/freebsd-indentation-metrics-20090913.tar.bz2 (freebsd ports indentation metrics dump)
o http://challenge.softwareprocess.es/freebsd-gnats-20090913.tar.bz2 (freebsd bug emails)
o http://challenge.softwareprocess.es/20090913.udd.sql.gz (ubuntu debian database)
o http://challenge.softwareprocess.es/svnmirror-base-r179637.tbz2 (original freebsd svn mirror)
o http://challenge.softwareprocess.es/freebsd-svn-20090916.tar.bz2 (full freebsd svn mirror)
o http://challenge.softwareprocess.es/freebsd-mail-archive-20090913.tar-split/ (mail archive of freebsd project)
o http://challenge.softwareprocess.es/freebsd-20090913.tar.bz2-split/ (freebsd everything except extracted data)

Thank you to

* The efforts of Christian Bird which made this challenge so much easier to run.
* The efforts of Israel Herraiz for parsing the email databases.
* The efforts of Emad Shihab for parsing the version control systems.

Related Resources

BDCAT 2024   IEEE/ACM Int’l Conf. on Big Data Computing, Applications, and Technologies
IEEE WSAI 2024   IEEE--2024 the 6th World Symposium on Artificial Intelligence (WSAI 2024)
ICIBA 2024   4th IEEE International Conference on Information Technology, Big Data and Artificial Intelligence
ACM MLMI 2024   ACM--2024 The 7th International Conference on Machine Learning and Machine Intelligence (MLMI 2024)
VISSOFT 2024   12th IEEE Working Conference on Software Visualization
VLSIE 2024   5th International Conference on VLSI & Embedded Systems
IEEE CAIT 2024   IEEE--2024 5th International Conference on Computers and Artificial Intelligence Technology (CAIT 2024)
ISCMI 2024   2024 11th International Conference on Soft Computing & Machine Intelligence (ISCMI 2024)
ICCDE--EI 2025   2025 11th International Conference on Computing and Data Engineering (ICCDE 2025)
BDPC 2025   2025 3rd International Conference on Big Data and Privacy Computing (BDPC 2025)