Final projects will count as 50% of the grade for this course. This assignment is evaluated by means of a written report and an oral presentation during the last lecture of the course. Students may work in groups (up to 4 students) but all of them have to present their work during the presentation day. The topic of the project can be chosen among the proposed topics below. The final report has to be written in form of a scientific paper. More guidelines will be added later.
You are free to use available software and available data sets. However, make sure that you consequently cite all your sources and point out your own work (data collection, implementations, etc). Missing citations will immediately result in failing the assignment.
Consider several issues before selecting a topic:
Topics (more information will be added later):
| topic | info, links, etc |
|---|---|
| OCR postprocessing | Movie subtitles are scanned from DVD's via OCR. This results in typical OCR errors confusing, for example, '1' with 'l' and 'I'. You can get a large databse of movie subtitles from me with subtitles in various languages. Try to learn a spelling correction model in some way for one or more languages. Or is it even possible to be more or less language independent? |
| Movie recommendation | Recommend movies to some-one based on the movies other people like. You can make use of the Netflix database and even win $1,000,000 if you are very successful! |
| Wikipedia vandalism | Wikipedia can be edited freely by everybody and of course there are some people who make use of this in the wrong way. A lot of volunteers are there to correct these vandalism-acts, perhaps a machine-learning technique can help them with this. |
| Computational justice | In a good justice system, people should be convicted based on facts. If you can find the datasets, you can see if you can make a machine learning approach to such a justice system. |
| speaker detection from transcribed text | learn to detect a speaker from transcribed speeches; data can be taken, for instamce, from Europarl (European Parliament Proceedings) |
| speaker detection | learn to detect speakers from recorded speech (you might need some background in handling sound data) |
| predict the Dutch football champion | try to predict the coming champion of the Eredivisie by learning from previous years; select important features such as money spent on new players, trainer/player's records, supporters, general tendencies etc |
| predict OSCAR winners | learn to predict whether a movie has a chance to win the OSCAR |
| stock market predictions | learn to predict the development of the AEX (or interest rates or something else at the stock market) |
| 4-in-a-row (connect-4) | learn to play 4-in-a-row, for example, by playing against itself |
| face recognition | recognize faces or other images |
| optical character recognition / handwriting recognition | train a system that recognizes handwritten or typed (and scanned) texts on example data; alternativly, identify a person on his/her signature |
| weather/rain prediction | forecast according to some conditions |
| music classification | learn to sort mp3-files into genres |
| song detection | identify songs from whistling/humming it (you probably need a lot of background knowledge about handling/processing sound files!) |
| text prediction for mobile phones | use ML techniques for learning of word prediction and/or word completion |
| author identification | learn to identify the author of a given text |
| painter identification | learn to identify a painters from a set of paintings |
| rain prediction | predict whether it is going to rain tomorrow or not |
| your own topic | come and talk to me about your own topic |
| Other free datasets | Perhaps you can find a nice idea browsing through available datasets.
For instance: http://www.economicsnetwork.ac.uk/links/data_free.htm, http://archive.ics.uci.edu/ml/ http://www.statsci.org/datasets.html http://lib.stat.cmu.edu/datasets/ http://www.cs.toronto.edu/~delve/data/datasets.html http://www.robjhyndman.com/TSDL/ |
Final assignments are intentionally left very open. Each group will be evaluated on their ability of organizing their own project. This includes a proper motivation of the task, background research, sufficient data collection, justification of the selected approach and a proper evaluation. Shortcomings are not necessarily a problem as long as they are discussed appropriately.
Don't hesitate to discuss issues/problems with me during the lab sessions or via e-mail. Interaction with other groups is no problem as long as there is no copying and plagiarism.
Here are some more hints: