Version 1 of Qualified geographic names

Updated 2002-03-13 09:16:25

Richard Suchenwirth - In Tclworld I want to have much and detailed geographic data, both for display on the map and for browsing additional facts. In this process I notice that not all geographic names are unique - which would be required for using them as array indices in the database. A city named Hamilton exists for instance in England, but also in Australia, Canada, New Zealand (colonialists sometimes have limited imagination ;-), as well as in the US states of Alabama, Montana, and Ohio. In the US case, the typical solution is to append the state name, often in the two-letter USPS abbreviation. For the countries of the world (and dependent territories), there are also two-letter codes in ISO 3109(?), but not disjoint from the US state codes (e.g. CA is both California and Canada - see language/country name servers). However, one could introduce a hierarchic scheme, e.g. from specific to generic, like Web domains:

 Hamilton,GB
 Hamilton,OH.US

or, for having the tree structure clearer (e.g. in sorted lists):

 GB;Hamilton
 US.OH;Hamilton

in which the semicolon marks the place where the "display name" (to be put on the map, or database browser) begins - which is easily extracted with the regexp

 regexp {(.+;)?(.+)} $qualifiedname -> - displayname

Two-letter codes for administrative subdivisions are also usual in Canada, Switzerland, Italy, and defined but rarely used in China, Germany, etc. (Others: please add!) For France, one might use the two-digit departement codes as used on number plates (e.g. 13: Bouches du Rh�ne (Marseille); 75: Paris).

Resolution of such "fully qualified pathes" could be done with database entries like

 + GB = {Great Britain}
 + US = {United States of America}
 + US.OH = Ohio

(+ is an alias for the database, to make these lines valid Tcl commands - see A simple database API). Here's how to resolve such geocodes to human-readable:

 proc explain {db code} {
    set res ""; set region ""; set name ""
    regexp {(..)([.]([^;]+)(;(.+))?)?} $code -> country - region - name
    if {$name!=""} {append res "$name in "}
    if {[set dbRegion [$db $country.$region =]] != ""} {
        set region $dbRegion
    }
    if {$region!=""} {append res "$region, "}
    if {[set dbCountry [$db $country =]] != ""} {
        set country $dbCountry
    }
    append res $country
 }
 % explain + GB
 Great Britain
 % explain + US.OH
 Ohio, United States of America
 % explain + "US.OH;Hamilton"
 Hamilton in Ohio, United States of America

One might consider a level above countries, which would of course be continent. One letter is too short, as the majority of continent names starts with A, so for instance

 + AFR = Africa
 + AMN = {North America}
 + AMS = {South America}
 + ANT = Antarctica
 + ASI = Asia
 + AOC = {Australia & Oceania}
 + EUR = Europe

and a pedant might even construct "fully qualified names" like this:

 Terra.AMN.US.CO.Denver

to be prepared for interstellar extensions - but we better keep the data compact, and using the country code as "top-level domain" should be sufficient for the foreseeable future.


Geographic mapping the Tcl way - Arts and crafts of Tcl-Tk programming