No padding please: efficiently processing variable-sized examples in deep learning with applications to neural handwriting recognition
Gideon Maillette de Buy Wenniger, Lambert Schomaker and Andy Way


Neural methods are dominating in the field of handwriting recognition.
In neural handwriting recognition, a text image is consumed by a neural network that incrementally extract features and makes predictions about the presence of specific characters in each region of the image. These predictions are finally collapsed into a sequence of character probabilities, that is typically consumed by a decoder with language model to form complete recognitions.
The connectionist temporal classification (CTC) loss function is applied to train the network.

For both feature extraction and incremental prediction, multi-dimensional long-short term memory (MDLSTM) is a particularly useful building block. But MDLSTMs have a high computational cost. For this reason, increasing efficiency when working with MDLSTMs is crucial for scalable application.

In this work I present three methods to increase efficiency when working with MDLSTMs, combined with efficient pytorch implementations of these methods: 1) Examples packing: tiling multiple examples into one, nearly removing all padding, 2) Convolutions with grouping: employing special convolutions to parallelly compute several linear layers with different inputs, outputs and weights, 3) Multi-GPU computation with example lists: support efficient multi-gpu computation without the need to pad examples to same-sized tensors.

These proposed algorithms and techniques are applied to the task of handwriting recognition, and tested on the IAM Handwriting Database, showing their effectiveness. Notably, the proposed techniques can also readily be applied to neural speech recognition or other deep learning problems dealing with variable-sized inputs.