Shared Task on Cross-Genre Gender Detection in Dutch

The shared task organised in the context of CLIN29 in Groningen is concerned with binary gender prediction within and across different genres in Dutch. It is modelled along an existing shared task for Italian, GxG. Please note that we adopt a broad and non-technical notion of “genre”, as to mainly indicate different sources we obtain data from.

Important dates

October 5: train and dev data available
December 7: test data available
December 14: deadline for sending in predictions
December 21: results known for participants
January 18: short paper deadline
January 31: CLIN!

Schedule CLIN

Timeslot Program (Benedenzaal 1)
16:15 - 16:30 Introduction by GxG organisers
16:30 - 17:00 Participants' presentations
17:00 - 17:15 Discussion (everyone!)

Papers and Presentations

Authors Title Presentation?
Matej Martinc and Senja Pollak Pooled LSTM for Dutch cross-genre gender classification [PDF] Yes
Lennart Faber, Ian Matroos, Leon Melein and Wessel Reijngoud Co-Training vs. Simple SVM Comparing Two Approaches for Cross-Genre Gender Prediction [PDF] Yes
Rianne Bos, Kelly Dekker and Harmjan Setz Embedding and Clustering for Cross-Genre Gender Prediction [PDF] Yes
Eva Vanmassenhove, Amit Moryossef, Alberto Poncelas, Andy Way and Dimitar Shterionov ABI Neural Ensemble Model for Gender Prediction <[PDF] Yes
Eduardo Brito, Rafet Sifa and Christian Bauckhage Two Attempts to Predict Author Gender in Cross-Genre Settings in Dutch [PDF] Yes
Gerlof Bouma Exploring Combining Training Datasets for the CLIN 2019 Shared Task on Cross-genre Gender Detection in Dutch [PDF] No
Evgenii Glazunov Gender prediction using lexical, morphological, syntactic and character-based features in Dutch [PDF] No

Paper Format

Shared task participants are invited to submit papers describing their systems and results. The papers should be a maximum of 5 pages excluding references, and should use the ACL 2018 2-column style file. All papers will be peer-reviewed by at least 1 member of the program committee. Good papers should describe the systems in sufficient detail and provide insight into what was effective for performance on the shared task and what was not.

The program committee consists of:


Given a (collection of) text(s) from a specific genre, the gender of the author has to be predicted. The task is cast as a binary classification task, with gender represented as F (female) or M (male). Gender prediction will be done in two ways:

A crucial aspect of this task is designing settings, as they are key to shed light on the core question: are there indicative traits across genres that can be leveraged to model gender in a rather genre-independent way?

This question will be answered by making participants train and test their models on datasets from different genres. For comparison, participants will also submit genre-specific models that will be tested on the very same genre they have been trained on. In-genre modelling will (i) shed light on which genres might be easier to model, i.e. where gender traits are more prominent; and (ii) make it easier to quantify the loss when modelling gender across genres.

More specifically, participants will be asked to submit up to six different models:

Twitter in-genre model non-Twitter model for Twitter
YouTube in-genre model non-YouTube model for YouTube
News in-genre model non-News model for News

In the cross-genre setting, the only constraint is not using in training any single instance from the genre they are testing on. Other than that, participants are free to combine the other datasets as they wish.

Participants are also free to use external resources as they wish, provided the cross-genre settings are carefully preserved, and everything used is described in detail.


This is a binary classification tasks with balanced classes. As standardly done in author profiling, we will evaluate performance using accuracy.

In order to derive two final scores, one for the in-genre and of for the cross-genre settings, we will simply average over the three accuracies obtained per genre. We will keep the two rankings separate. For determining the official “winner”, we will use the cross-genre ranking.

For all settings, given that the datasets are balanced for gender distribution, through random assignment we will have a 50% baseline.


We use Dutch data from the following three genres:

We might also introduce a new ‘secret’ genre only at test time, to further test the portability of models without any specific tuning, but this will be confirmed later on.

Gender distribution is balanced in all datasets (50/50), and datasets in all genres are of comparable sizes in terms of tokens.

Data is made available to participants providing a link upon request. Requests must be done via email (see Contact below), and access to data will be granted.


You can contact us at:

Related work

Some related work which might interest you: