[Richard Suchenwirth] 2002-06-07 - Given some piece of data where it is doubtful whether they are correct or not, one way to find out is just to ask a search engine like [Google], but disregard the results except for the number of found web pages. Chances are that the correct data have a higher hit rate than the faulty one. #!/bin/sh # \ exec wish "$0" "$@" package require http #http::config -proxyhost proxy -proxyport 80 proc google'nhits query { set url http://google.yahoo.com/bin/query?p=[string map {" " +} $query]&hc=0&hs=0 set token [http::geturl $url] set data [http::data $token] http::cleanup $token set nhits 0 regexp {\n[ 0-9-]+ of ([0-9]+)} $data -> nhits set nhits } proc go {w} { global query $w insert end "'$query': [google'nhits $query] hits\n" } entry .e -textvar query -bg white bind .e {go .t} text .t -bg white pack .e .t -fill x -expand 1 ---- Example output in the text widget (asking about a city in Italy, where post code and province were unsure): 'bellaria rn': 3580 hits 'bellaria fo': 609 hits 'bellaria 47814': 1130 hits 'bellaria 47014': 30 hits These results seem to indicate that 47814 Bellaria RN (Rimini) is the correct address ;-) On single words one might use this for spelling verification: 'suchenwirth': 280 hits 'suchenworth': 0 hits ..or to check how strong an association between several words is: 'suchenwirth tcl': 57 hits 'suchenwirth java': 14 hits The numbers may change over time, but the tendency ("fuzzy truth") can at least be estimated. ---- 2002-09-09 Google changed their layout slightly, so I had to add a space into the regexp in ''google'nhits''. [RS] ---- [Arts and crafts of Tcl-Tk programming]