At last, a computer that understands you like your mother.
--1985, McDonnell-Douglas ad (Lee, 2004)


  • Visiting Edinburgh NLP (November 24, 2017)
  • I'll join IT University of Copenhagen in May 2018
  • I'll be area chair for NAACL 2018
  • I won the IJCNLP 2017 shared task on multilingual customer feedback analysis (ranked 1st / 12 teams)!
  • September 6-11: I'll be at EMNLP 2017, Copenhagen
  • July 28-29: Excited to be invited speaker at the Google NLU (Natural Language Understanding) workshop, New York
  • I serve as ESSLLI 2018 chair for language and computation
  • I gave a keynote at PyData, Berlin, July, 2017
  • I was ACL 2017 area co-chair (for tagging, chunking and parsing)
  • I am member of the editorial board of the Computational Linguistics journal (for a three-year period, starting 2017)
  • I was EACL 2017 student research workshop senior faculty advisor


What we need are Natural Language Processing (NLP) models that are more robust: that work better on unexpected input (like new domains or new languages) and can be trained from semi-automatically or weakly annotated data from a variety of sources. My research focuses on bringing NLP one step closer to this goal, by combining fortuitous data with proper machine learning algorithms to enable robust language technology.
I am interested in learning under sample selection bias (domain adaptation, transfer learning), annotation bias (embracing annotator disagreements in learning) and generally, semi-supervised and weakly-supervised machine learning applied to cross-domain and cross-language natural language processing.

Fortuitous data

Ultimately, NLP should be able to handle any language and any domain. However, there is still a long way to go! Our models need training data, but annotated data is biased and scarce. One way to address this problem of training data sparsity is to leverage data that so far has been neglected or rests in non-obvious places. Such fortuitous data [1] includes using hyperlinks to build more robust Part-of-Speech taggers or named-entity recognizers, learning from annotator disagreement and using behavioral data such as gaze or keystrokes [2] to inform NLP. Read up more:

  1. Barbara Plank. What to do about non-standard (or non-canonical) language in NLP. In KONVENS 2016. [arXiv]
  2. Barbara Plank. Keystroke dynamics as signal for shallow syntactic parsing. The 26 th International Conference on Computational Linguistics (COLING). Osaka, Japan. [arXiv]
  3. Barbara Plank, Anders Johannsen and Željko Agić. Improving language technology with fortuitous data, ESSLLI 2016 summer school.


Selected publications (more)

  • Sebastian Ruder and Barbara Plank. Learning to select data for transfer learning with Bayesian Optimization. In EMNLP 2017, Copenhagen, Denmark. [arXiv]
  • Héctor Martínez Alonso and Barbara Plank. When is multitask learning effective? Semantic sequence prediction under varying data conditions. In EACL (long). [pdf] [arXiv]
  • Barbara Plank. Keystroke dynamics as signal for shallow syntactic parsing. The 26th International Conference on Computational Linguistics (COLING). Osaka, Japan. [arXiv] received finalist for best paper award
  • Johannes Bjerva, Barbara Plank and Johan Bos. Semantic Tagging with Deep Residual Networks. The 26th International Conference on Computational Linguistics (COLING). Osaka, Japan. [arXiv]
  • Chloe Braud, Barbara Plank and Anders Søgaard. Multi-view and multi-task training of RST discourse parsers. The 26th International Conference on Computational Linguistics (COLING). [pdf]
  • Barbara Plank. What to do about non-standard (or non-canonical) language in NLP. In KONVENS 2016. [pdf] [arXiv]
  • Željko Agić, Anders Johannsen, Barbara Plank, Héctor Martínez Alonso, Natalie Schluter and Anders Søgaard. Multilingual Projection for Parsing Truly Low-Resource Languages. In [TACL], 2016.
  • Barbara Plank, Anders Søgaard and Yoav Goldberg. Multilingual Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Models and Auxiliary Loss. In ACL (short), 2016. [arXiv]
  • Ben Verhoeven, Walter Daelemans and Barbara Plank. TwiSty: a Multilingual Twitter Stylometry Corpus for Gender and Personality Profiling. In LREC 2016.
  • Raffaella Bernardi, Ruket Cakici, Desmond Elliott, Aykut Erdem, Erkut Erdem, Nazli Ikizler-Cinbis, Frank Keller, Adrian Muscat and Barbara Plank. Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures. To appear in JAIR. [JAIR]

Recent talks

  • July 28, 2017, Google Research NLU workshop, New York
  • July 1, 2017: PyData 2017 Berlin, Natural Language Processing: Challenges and Next Frontiers [YouTube]
  • March 28, 2017, Geneva: "What to do about non-canonical data in NLP"
  • March 27, 2017, Geneva: "Multi-task learning in NLP: What? How? When?"
  • March 14, 2017, Keynote at the Nuance Research Conference (NRC) 2017: "Beyond text: fortuitous data and deep multi-task learning for processing non-standard text"
  • March 10, 2017, Milan: "Introduction to Natural Language Processing"
  • YRNLP, Osaka Japan, December 10, 2016, Young Researcher in Natural Language Processing in Japan: "Variety in research, research in variety"


Professional Service

  • Chair & board member:
    • NAACL 2018 area chair (Multilingual NLP including Phonology, Morphology and Word Segmentation)
    • ESSLLI 2018 Chair for Language and Computation
    • ACL 2017 area chair (Tagging, Chunking, Syntax and Parsing)
    • EACL 2017 Student research workshop faculty advisor
    • Editorial board member Computational Linguistics journal (2017-2019)
    • ACL 2016 publicity chair
    • EMNLP 2015 publicity chair
  • Program committee for conferences: AAAI 2018, 2017, 2016; NIPS 2017, 2016; ACL 2017, 2016, 2015, 2014, 2013; EMNLP 2016, 2015, 2014; NAACL 2016; CoNLL 2017, 2016, 2015; COLING 2016, 2014; KONVENS 2016; IJCNLP 2014; *SEM 2015;
  • Program committee for workshops (selected): NAACL SRW 2016; CL4LC 2016; DADA 2016; MWE 2016,2015; LAW 2016; L&V 2016; NoDaLiDa 2013, 2015; NLPIT 2016, 2015; IWPT 2015; SemEval 2015; IJCAI 2013; CLIN 20;
  • Journals: PLOS ONE, 2016; Computational Linguistics; Information Processing and Management Journal 2013; Journal of Logic and Computation special issue, 2012; IMIX project book chapter 2011; JIS 2016;

Bio, Teaching & more

Short Bio

  • since April 2016: Assistant Professor (tenured), University of Groningen (RUG)
  • Sep 2014-Mar 2016: Assistant Professor, CST, University of Copenhagen (UCPH)
  • Aug 2013-Aug 2014: Postdoc, CST, Copenhagen Lowlands
  • Nov 2011-Jun 2013: Postdoc, DISI, Trento LiMoSiNe project
  • 2007-2011: Ph.D., cum laude, University of Groningen
  • MSc European Masters Program in Language and Communication Technologies (EM-LCT), cum laude. Joint degree from the University of Bozen-Bolzano (Italy) and University of Amsterdam (UvA, The Netherlands) (2007).
  • BSc, Computer Science, University of Bozen-Bolzano (2005).

PhD students

  • Hessel Haagsma (co-supervision with Johan Bos)
  • Johannes Bjerva (co-supervision with Johan Bos): submitted; now Postdoc at University of Copenhagen


  • 2017-2018:
    • Language Technology project (i.e., project-based intro to Deep Learning for NLP, Master's level)
    • Collecting Data (Master in Digital Humanities)
    • Shared Task (Master's level)
    • Bachelorscriptie Informatiekunde
    • Computationele Grammatica
    • Inl.wetensch.onderzoek/Introduction to research methods
    • Digital Skills
  • 2016-2017:
    • Language Technology project (Master's)
    • Collecting Data (new Master in Digital Humanities)
    • Bachelorscriptie Informatiekunde
    • Computationele Grammatica
    • Inl.wetensch.onderzoek/Introduction to research methods
  • Summer 2016: ESSLLI 2016 summer school on Fortuitous data, Bozen-Bolzano
  • Spring 2016: Language Technology Project, RUG
  • Spring 2016: Language Processing 2, UCPH (initial lectures before departure)
  • Autumn 2015: Cognitive Science 1, UCPH
  • Spring 2015: Language Processing 2, UCPH
  • Autumn 2014: Cognitive Science 1, UCPH

Code & Data

Press & Media