proc init {} { variable map variable alphanumeric a-zA-Z0-9 for {set i 0} {$i <= 256} {incr i} { set c [format %c $i] if {![string match \[$alphanumeric\] $c]} { set map($c) %[format %.2x $i] } } # These are handled specially array set map { " " + \n %0d%0a } } init proc url-encode {string} { variable map variable alphanumeric # The spec says: "non-alphanumeric characters are replaced by '%HH'" # 1 leave alphanumerics characters alone # 2 Convert every other character to an array lookup # 3 Escape constructs that are "special" to the tcl parser # 4 "subst" the result, doing all the array substitutions regsub -all \[^$alphanumeric\] $string {$map(&)} string # This quotes cases like $map([) or $map($) => $map(\[) ... regsub -all {[][{})\\]\)} $string {\\&} string return [subst -nocommand $string] } proc url-decode str { # rewrite "+" back to space # protect \ from quoting another '\' set str [string map [list + { } "\\" "\\\\"] $str] # prepare to process all %-escapes regsub -all -- {%([A-Fa-f0-9][A-Fa-f0-9])} $str {\\u00\1} str # process \u unicode mapped chars return [subst -novar -nocommand $str] } This is almost exactly source taken from the implementations of [http] (except http has moved to a C implementation? for good?) and [ncgi] (and should ncgi re-use http's command?). [Lars H]: This encodes a string of bytes using printable [ASCII], but what can/should be done if one wants to use arbitrary Unicode characters in a string? I suppose that question mostly boils down to "which [encoding] is used for URLs?" Are there any RFCs or the like that specifies that? [[Yes, so that's one job: find the correct name of this translation ("x-url-encoding"?) and its official specification.]] [Lars H]: The person who speaks in [[brackets]] here seems to have misunderstood my point. The x-url-encoding is, as far as I can tell (the fact that almost every occurrence of ''x-url-encoding'' that turns up in Google is a Tcl manpage speaks against this being an official name), what is implemented on this page, but my point was rather how to go beyond anglocentric URLs. What if I want to use the string "хлеб" or "борщ" in a web address, then how should I encode it? ---- 18may05 [jcw] - With 8.4, this ought to do the same: proc ue_init {} { lappend d + { } for {set i 0} {$i < 256} {incr i} { set c [format %c $i] set x %[format %02x $i] if {![string match {[a-zA-Z0-9]} $c]} { lappend e $c $x lappend d $x $c } } set ::ue_map $e set ::ud_map $d } ue_init proc ue {s} { string map $::ue_map $s } proc ud {s} { string map $::ud_map $s } puts [ue "wiki.tcl.tk/is fun!"] puts [ud [ue "wiki.tcl.tk/is fun!"]] puts [ue "a space and a \n new line :)"] puts [ud [ue "a space and a \n new line :)"]] puts [ud "1+1=2"] [[Certain? [[ue]] appears to me to map ' '->'%20', while [[url-encode]] sends ' '->'+'.]] [[Let me add, though, that I very much appreciate the elegance and flexibility of these recodings.]]