next up previous
Next: Complications for Ngrams Up: Robust parsing of word-graphs Previous: Example: nlp_speech method.


Search algorithm

The robustness component can be characterised as a search in the annotated word-graph. The goal of the search is the best path from v0 to vn+1. This search reduces to a well-known graph search problem, namely the problem of finding the shortest path in a directed acyclic graph with uniform weights.

We use a variant of the DAG-SHORTEST-PATH algorithm [18]. This algorithm finds shortest paths for uniformly weighted directed acyclic graphs. The first step of the algorithm is a topological sort of the vertices of the graph. It turns out that the state names of the word-graph that we obtain from the speech recogniser are already topologically sorted: state names are integers, and edges always connect to larger integers. The second step of the algorithm maintains an array A which records for each state vk the weight associated with the best path known from v0 to vk. A similar array, P, is used to represent for each state the history of this best path, as a sequence of QLFs (since that is what we want to obtain eventually).

The first step of the algorithm initialises these arrays such that for each state $v_i (i\neq 0) A[v_i] = \infty$, and $P[v_i] =
\mbox{NIL}$. For v0 we specify $A[v_0] = \mbox{\it ini\/}$ and $P[v_0]=\epsilon$. After this initialisation phase the algorithm treats each edge of the graph in topologically sorted order of the source vertex, as follows:

\begin{displaymath}\small\begin{minipage}[t]{.9\textwidth}\normalsize\begin{flus...
...e{8em} relax $(v_i,v_j,w,a,q)$\\
\end{flushleft}\end{minipage}\end{displaymath} (34)

The function relax is defined on edges and updates the arrays if a better path to a vertex has been found:

\begin{displaymath}\small\begin{minipage}[t]{.9\textwidth}\normalsize\begin{flus...
...ce{6.5em} $P[v_j] =: P[v_i].q$\\
\end{flushleft}\end{minipage}\end{displaymath} (35)

When the algorithm finishes, P[vn+1] constitutes the sequence of QLFs associated with the best path found. The weight of this path is given by A[vn+1]. This algorithm is efficient. Its running time is O(V+E), where V is the number of vertices and E is the number of edges. Therefore, it can be expected that this part of processing should not decrease parsing efficiency too much, since the number of edges is O(V2).8 For a more detailed account of the correctness and complexity of this algorithm, see Cormen, Leiserson, and Rivest [18]. 9

A simple generalisation of the algorithm has been implemented in order to obtain the N best solutions. In this case we maintain in the algorithm for each vertex the N best paths found so far. Such a generalisation increases the worst-case complexity by only a constant factor, and is very useful for development work.


next up previous
Next: Complications for Ngrams Up: Robust parsing of word-graphs Previous: Example: nlp_speech method.

2000-07-10