Shared Task on Cross-Genre Gender Detection in Dutch

The shared task organised in the context of CLIN29 in Groningen is concerned with binary gender prediction within and across different genres in Dutch. It is modelled along an existing shared task for Italian, GxG. Please note that we adopt a broad and non-technical notion of “genre”, as to mainly indicate different sources we obtain data from.

Important dates

October 5: train and dev data available
December 7: test data available
December 14: deadline for sending in predictions
December 21: results known for participants
January 18: short paper deadline
January 31: CLIN!


Given a (collection of) text(s) from a specific genre, the gender of the author has to be predicted. The task is cast as a binary classification task, with gender represented as F (female) or M (male). Gender prediction will be done in two ways:

A crucial aspect of this task is designing settings, as they are key to shed light on the core question: are there indicative traits across genres that can be leveraged to model gender in a rather genre-independent way?

This question will be answered by making participants train and test their models on datasets from different genres. For comparison, participants will also submit genre-specific models that will be tested on the very same genre they have been trained on. In-genre modelling will (i) shed light on which genres might be easier to model, i.e. where gender traits are more prominent; and (ii) make it easier to quantify the loss when modelling gender across genres.

More specifically, participants will be asked to submit up to six different models:

Twitter in-genre model non-Twitter model for Twitter
YouTube in-genre model non-YouTube model for YouTube
News in-genre model non-News model for News

In the cross-genre setting, the only constraint is not using in training any single instance from the genre they are testing on. Other than that, participants are free to combine the other datasets as they wish.

Participants are also free to use external resources as they wish, provided the cross-genre settings are carefully preserved, and everything used is described in detail.


This is a binary classification tasks with balanced classes. As standardly done in author profiling, we will evaluate performance using accuracy.

In order to derive two final scores, one for the in-genre and of for the cross-genre settings, we will simply average over the three accuracies obtained per genre. We will keep the two rankings separate. For determining the official “winner”, we will use the cross-genre ranking.

For all settings, given that the datasets are balanced for gender distribution, through random assignment we will have a 50% baseline.


We use Dutch data from the following three genres:

We might also introduce a new ‘secret’ genre only at test time, to further test the portability of models without any specific tuning, but this will be confirmed later on.

Gender distribution is balanced in all datasets (50/50), and datasets in all genres are of comparable sizes in terms of tokens.

Data is made available to participants providing a link upon request. Requests must be done via email (see Contact below), and access to data will be granted.


You can contact us at:

Related work

Some related work which might interest you: