Search algorithm

Next: Complications for Ngrams Up: Robust parsing of word-graphs Previous: Example: nlp_speech method.

Search algorithm

The robustness component can be characterised as a search in the annotated word-graph. The goal of the search is the best path from v₀ to v_n+1. This search reduces to a well-known graph search problem, namely the problem of finding the shortest path in a directed acyclic graph with uniform weights.

We use a variant of the DAG-SHORTEST-PATH algorithm [18]. This algorithm finds shortest paths for uniformly weighted directed acyclic graphs. The first step of the algorithm is a topological sort of the vertices of the graph. It turns out that the state names of the word-graph that we obtain from the speech recogniser are already topologically sorted: state names are integers, and edges always connect to larger integers. The second step of the algorithm maintains an array A which records for each state v_k the weight associated with the best path known from v₀ to v_k. A similar array, P, is used to represent for each state the history of this best path, as a sequence of QLFs (since that is what we want to obtain eventually).

The first step of the algorithm initialises these arrays such that for each state $v_i (i\neq 0) A[v_i] = \infty$ , and $P[v_i] = \mbox{NIL}$ . For v₀ we specify $A[v_0] = \mbox{\it ini\/}$ and $P[v_0]=\epsilon$ . After this initialisation phase the algorithm treats each edge of the graph in topologically sorted order of the source vertex, as follows:

$\begin{displaymath}\small\begin{minipage}[t]{.9\textwidth}\normalsize\begin{flus... ...e{8em} relax $(v_i,v_j,w,a,q)$\\ \end{flushleft}\end{minipage}\end{displaymath}$

(34)

The function relax is defined on edges and updates the arrays if a better path to a vertex has been found:

$\begin{displaymath}\small\begin{minipage}[t]{.9\textwidth}\normalsize\begin{flus... ...ce{6.5em} $P[v_j] =: P[v_i].q$\\ \end{flushleft}\end{minipage}\end{displaymath}$

(35)

When the algorithm finishes, P[v_n+1] constitutes the sequence of QLFs associated with the best path found. The weight of this path is given by A[v_n+1]. This algorithm is efficient. Its running time is O(V+E), where V is the number of vertices and E is the number of edges. Therefore, it can be expected that this part of processing should not decrease parsing efficiency too much, since the number of edges is O(V²).⁸ For a more detailed account of the correctness and complexity of this algorithm, see Cormen, Leiserson, and Rivest [18]. ⁹

A simple generalisation of the algorithm has been implemented in order to obtain the N best solutions. In this case we maintain in the algorithm for each vertex the N best paths found so far. Such a generalisation increases the worst-case complexity by only a constant factor, and is very useful for development work.

Next: Complications for Ngrams Up: Robust parsing of word-graphs Previous: Example: nlp_speech method.

2000-07-10