Rule induction for global explanation of neural networks
Madhumita Sushil, Simon Suster and Walter Daelemans


Understanding the behavior of a neural network and finding explanations for its outputs is important for improving the network’s performance and generalization ability, and for ensuring trust in automated systems. Several approaches have previously been proposed to identify and visualize the most important features in a network. However, the relations between different features and classes are lost in most cases. We propose a technique to induce sets of if-then-else rules that capture these relations to globally explain the predictions of a network. We first compute the importance of the features in the trained network. We then weigh the original inputs with these feature importance scores, simplify the transformed input space, and finally fit a rule induction model to explain the model predictions. We evaluate the technique to understand networks that are trained to predict in-hospital mortality and the primary diagnostic category for patients. We find that meaningful rules can be learned in this manner, and that these rules help us understand and diagnose the network.