Machine Learning - Final Project

Course Number: LIX004M5
Instructor: Jörg Tiedemann, Alfa-informatica, tiedeman@let.rug.nl

Final projects will count as 50% of the grade for this course. This assignment is evaluated by means of a written report and an oral presentation during the last lecture of the course. Students may work in groups (up to 4 students) but all of them have to present their work during the presentation day. The topic of the project can be chosen among the proposed topics below. The final report has to be written in form of a scientific paper. More guidelines will be added later.

Project proposals

The topics listed below are general proposals for final projects. Students are expected to make their own decisions with regards to data collection, machine learning approaches, implementation and evaluation/analyses of the results. All of these decisions should be well motivated and described in detail in the final report. The presentation should be clear and understandable for other students without requiring additional background knowledge. In other words, new concepts have to be explained at least on a superficial level to make the presentations interesting for everybody. Try to sell your work!

You are free to use available software and available data sets. However, make sure that you consequently cite all your sources and point out your own work (data collection, implementations, etc). Missing citations will immediately result in failing the assignment.

Consider several issues before selecting a topic:

Topics (more information will be added later):

topicinfo, links, etc
OCR postprocessing Movie subtitles are scanned from DVD's via OCR. This results in typical OCR errors confusing, for example, '1' with 'l' and 'I'. You can get a large databse of movie subtitles from me with subtitles in various languages. Try to learn a spelling correction model in some way for one or more languages. Or is it even possible to be more or less language independent?
Movie recommendation Recommend movies to some-one based on the movies other people like. You can make use of the Netflix database and even win $1,000,000 if you are very successful!
Wikipedia vandalism Wikipedia can be edited freely by everybody and of course there are some people who make use of this in the wrong way. A lot of volunteers are there to correct these vandalism-acts, perhaps a machine-learning technique can help them with this.
Computational justice In a good justice system, people should be convicted based on facts. If you can find the datasets, you can see if you can make a machine learning approach to such a justice system.
speaker detection from transcribed text learn to detect a speaker from transcribed speeches; data can be taken, for instamce, from Europarl (European Parliament Proceedings)
speaker detection learn to detect speakers from recorded speech (you might need some background in handling sound data)
predict the Dutch football champion try to predict the coming champion of the Eredivisie by learning from previous years; select important features such as money spent on new players, trainer/player's records, supporters, general tendencies etc
predict OSCAR winners learn to predict whether a movie has a chance to win the OSCAR
stock market predictions learn to predict the development of the AEX (or interest rates or something else at the stock market)
4-in-a-row (connect-4) learn to play 4-in-a-row, for example, by playing against itself
face recognition recognize faces or other images
optical character recognition / handwriting recognition train a system that recognizes handwritten or typed (and scanned) texts on example data; alternativly, identify a person on his/her signature
weather/rain prediction forecast according to some conditions
music classification learn to sort mp3-files into genres
song detection identify songs from whistling/humming it (you probably need a lot of background knowledge about handling/processing sound files!)
text prediction for mobile phones use ML techniques for learning of word prediction and/or word completion
author identification learn to identify the author of a given text
painter identification learn to identify a painters from a set of paintings
rain prediction predict whether it is going to rain tomorrow or not
your own topic come and talk to me about your own topic
Other free datasets Perhaps you can find a nice idea browsing through available datasets. For instance:
http://www.economicsnetwork.ac.uk/links/data_free.htm,
http://archive.ics.uci.edu/ml/
http://www.statsci.org/datasets.html
http://lib.stat.cmu.edu/datasets/
http://www.cs.toronto.edu/~delve/data/datasets.html
http://www.robjhyndman.com/TSDL/

Final assignments are intentionally left very open. Each group will be evaluated on their ability of organizing their own project. This includes a proper motivation of the task, background research, sufficient data collection, justification of the selected approach and a proper evaluation. Shortcomings are not necessarily a problem as long as they are discussed appropriately.

Don't hesitate to discuss issues/problems with me during the lab sessions or via e-mail. Interaction with other groups is no problem as long as there is no copying and plagiarism.

Project Presentation in Class

Every group has to present their project in class. Presentations will be about 15 minutes (the actual schedule will be published later) and each member should be part of it. The presentation will count for the grade of the final project! Focus on those parts that you find most interesting and which are novel/innovative in your opinion. However, try to present your project in such a way that it will be understandable and interesting for everybody in class. There will be time for questions and discussions after each presentation. Try to participate in the discussion as much as possible after all presentations.

Writing the Final Report

The final report should include all necessary details about your work and the project to be self-contained and scientific (and understandable) paper. All sources have to be quoted properly and our own work should be clearly marked. Try to be precise and to the point focusing on the important parts, especially the once where you see your greatest contributions.

Here are some more hints: