[Richard Suchenwirth] 2007-06-27 - http://www.fallingrain.com/world/ (Copyright 1996-2004 by Falling Rain Genomics, Inc.) provides a very large, publicly accessible gazetteer of the world's cities and airports - they must have millions of entries available in [HTML] format. To avoid that pages get too big, they use a partly very deep URL tree. For instance, to locate my city Konstanz, the URL is
http://www.fallingrain.com/world/a/K/o/n/s/t/a/n/z/
In other cases, short prefixes are sufficient, e.g. all 131 airports whose code starts with ED (plus some others) are delivered by the URL
http://www.fallingrain.com/world/a/E/D/
So to search for a place one has to iterate the URL, appending letter after letter (or its decimal Unicode if it is outside of ASCII) until a match is found. Here's a proc that does this - called with a place name, it returns a list of hits, where each hit is a list of
name type region country lat lon elevation(ft) population(est)
----
#!/usr/bin/env tclsh
package require http
proc geo'get'rain placename {
set url http://www.fallingrain.com/world/a/
set res {}
foreach c [split $placename ""] {
set i [scan $c %c]
if {$i < 65 || $i > 127} {set c $i}
append url $c/
set token [http::geturl $url]
set page [http::data $token]
http::cleanup $token
foreach line [split $page \n] {
if [string match
\x80
""} $line] \x80]
regexp {(.+)} [lindex $fields 1] -> name
if [string match $placename* $name] {
lappend res [linsert [lrange $fields 2 end] 0 $name]
}
}
}
}
set res
}
#-- If this script is called as toplevel, the function is called, and results displayed:
if {[file tail [info script]] eq [file tail $argv0]} {
puts [join [geo'get'rain $argv] \n]
}
----
Testing:
/_Ricci> geo_rain.tcl Stockel
Stockel city {Province de )) (( Brabant} Belgium 50.8333333 4.45 262 309844
Stockelanda city {(( Alvsborgs Lan ))} Sweden 58.65 12 354 935
Stockels city {Land Hessen} Germany 50.5666667 9.7333333 1049 14658
Stockelsberg city {Land Bayern} Germany 49.3833333 11.2666667 1312 22990
Stockelsdorf city {Land Schleswig-Holstein} Germany 53.9 10.65 49 49614
Stockelweingarten city {Bundesland Karnten} Austria 46.6694444 13.9377778 1558 9777
It takes its time for the repeated queries, but it's good waiting for :^) The population figures are sometimes a bit high, because it is reported to cover a 7 km radius around the point.
Also, the "region" field contains nonsense for e.g. UK (almost always Aberdeen) and France (usually Alsace), Liechtenstein (always Balzers) - looks like instead of missing data, the alphabetically first region is returned. For Germany, US, etc. things look better.
[DKF]: Note that for large cities, the population returned can also be too small.
----
[Category Geography]