Heuristic Searches

NEM 23 May 2007: Many problems can be seen as a search for a solution within a space of possible states. Such a state space can be represented as a directed graph, where the nodes represent states and the edges represent possible transitions between states (often labeled with a cost associated with that transition). In order to find the solution to our problem we search the graph from our current state until we find a state that matches our goal (a goal state). A variety of search strategies have been developed to solve problems in this manner, which we can divide into two groups: uninformed searches attempt to exhaustively search the graph without any clue as to where solutions may lie, whereas heuristic (or informed) searches use some knowledge about the problem domain to guide the search towards likely solutions.

The five search strategies we illustrate are:

  1. Depth-first search which follows a path blindly to its end before trying any alternatives (an uninformed search).
  2. Breadth-first search which conservatively explores all alternatives at once, one node at a time (another informed search).
  3. Uniform-cost search expands nodes which have the least cost-so-far first (uninformed).
  4. Best-first search expands nodes which score best in some evaluation function. In this case we use a greedy search evaluation function which expands nodes that have the least estimated cost-to-distance, regardless of the cost so far.
  5. Finally, we show the famous A*-search which combines a measure of the cost-so-far and the estimated cost-to-goal to get a very efficient search strategy.

The search strategies are implemented as part of a lightweight digraph package, and this page serves as an example of their use. This example we use is taken from BOOK Artificial Intelligence: A Modern Approach by Russell and Norvig, page 95 (1st ed), but adapted to Tcl. The example involves planning a route from Arad in Romania to Bucharest [L1 ]. Firstly, we define our graph, where nodes are cities and the edges are roads between them, with the distance as the cost of the edge:

 package require digraph     0.6 
 set map [digraph create]
 foreach {source -> dests} {
     Arad        -> {Zerind 75 Timisoara 118 Sibiu 140}
     Bucharest   -> {Pitesti 101 Fagaras 211 Urziceni 85 Giurgiu 90}
     Craiova     -> {Dobreta 120 RimnicuVilcea 146 Pitesti 138}
     Dobreta     -> {Mehadia 75 Craiova 120}
     Eforie      -> {Hirsova 86}
     Fagaras     -> {Sibiu 99 Bucharest 211}
     Giurgiu     -> {Bucharest 90}
     Hirsova     -> {Urziceni 98 Eforie 86}
     Iasi        -> {Neamt 87 Vaslui 92}
     Lugoj       -> {Iasi 87}
     Mehadia     -> {Lugoj 70 Dobreta 75}
     Neamt       -> {Iasi 87}
     Oradea      -> {Zerind 71 Sibiu 151}
     Pitesti     -> {RimnicuVilcea 97 Craiova 138 Bucharest 101}
     RimnicuVilcea -> {Sibiu 80 Pitesti 97 Craiova 146}
     Sibiu       -> {Arad 140 Oradea 151 Fagaras 99 RimnicuVilcea 80}
     Timisoara   -> {Arad 118 Lugoj 111}
     Urziceni    -> {Bucharest 85 Vaslui 142 Hirsova 98}
     Vaslui      -> {Urziceni 142 Iasi 92}
     Zerind      -> {Arad 75 Oradea 71}
 } {
     foreach {dest cost} $dests {
         digraph edge map $source $dest -cost $cost

For the informed search methods we also need to define a heuristic that is used to estimate the cost of reaching the goal (Bucharest) from the current node. It is important that this heuristic be admissible -- i.e., that it never overestimates the cost to the goal. For route-planning a reasonable heuristic is the straight-line distance to the goal (SLD):

 # This is our estimation of the cost to the goal - the straight line
 # distance to the goal (SLD). This is admissible, as it can never over-
 # estimate the distance: SLD is the shortest possible distance.
 array set SLD {
    Arad            366
    Bucharest       0
    Craiova         160
    Dobreta         242
    Eforie          161
    Fagaras         178
    Giurgiu         77
    Hirsova         151
    Iasi            226
    Lugoj           244
    Mehadia         241
    Neamt           234
    Oradea          380
    Pitesti         98
    RimnicuVilcea   193
    Sibiu           253
    Timisoara       329
    Urziceni        80
    Vaslui          199
    Zerind          374
 proc SLD node { return $::SLD($node) }

We can now try out our various search strategies using the digraph package:

 foreach strategy {depth-first breadth-first uniform-cost 
        {best-first SLD} {a-star SLD}} {
    set steps 0
    puts "\nSearching $strategy..."
    digraph search $map Arad [concat digraph $strategy] {path cost} {
        incr steps
        if {[lindex $path end] eq "Bucharest"} {
            puts "Found result: [join $path { -> }]"
            puts "Iterations: $steps Size: [llength $path] Cost: $cost"

Remove the break statement to generate all solutions. The results are:

 Searching depth-first...
 Found result: Arad -> Sibiu -> RimnicuVilcea -> Craiova -> Pitesti -> Bucharest
 Iterations: 6 Size: 6 Cost: 605 
 Searching breadth-first...
 Found result: Arad -> Sibiu -> Fagaras -> Bucharest
 Iterations: 13 Size: 4 Cost: 450
 Searching uniform-cost...
 Found result: Arad -> Sibiu -> RimnicuVilcea -> Pitesti -> Bucharest
 Iterations: 19 Size: 5 Cost: 418
 Searching best-first SLD...
 Found result: Arad -> Sibiu -> Fagaras -> Bucharest
 Iterations: 4 Size: 4 Cost: 450
 Searching a-star SLD...
 Found result: Arad -> Sibiu -> RimnicuVilcea -> Pitesti -> Bucharest
 Iterations: 6 Size: 5 Cost: 418

We can see from this some general properties of the search strategies. Depth-first search quickly found a solution, but it is of poor quality. Breadth-first search took a moderate amount of time to find a solution with a small number of steps, but not the shortest distance. Uniform-cost search finds the lowest-distance solution, but takes the longest amount of time doing it. Greedy search quickly finds a solution, but again it is of only decent quality. Finally, A* search quickly finds the best solution. Indeed, for any given heuristic function, A* is optimally efficient: that is, no other search algorithm will expand fewer nodes than A* before it finds an optimal solution.

See also RS's Searching A star in space, which greatly improved on a much older version of these pages. Also, see State Space Searching in Tcl.