Version 15 of url-encoding

Updated 2005-05-19 15:12:18

proc init {} {

        variable map
        variable alphanumeric a-zA-Z0-9
        for {set i 0} {$i <= 256} {incr i} { 
            set c [format %c $i]
            if {![string match \[$alphanumeric\] $c]} {
                set map($c) %[format %.2x $i]
            }
        }
        # These are handled specially
        array set map { " " + \n %0d%0a }
    }
    init


  proc url-encode {string} {
    variable map
    variable alphanumeric

    # The spec says: "non-alphanumeric characters are replaced by '%HH'"
    # 1 leave alphanumerics characters alone
    # 2 Convert every other character to an array lookup
    # 3 Escape constructs that are "special" to the tcl parser
    # 4 "subst" the result, doing all the array substitutions

    regsub -all \[^$alphanumeric\] $string {$map(&)} string
    # This quotes cases like $map([) or $map($) => $map(\[) ...
    regsub -all {[][{})\\]\)} $string {\\&} string
    return [subst -nocommand $string]
  }



  proc url-decode str {
    # rewrite "+" back to space
    # protect \ from quoting another '\'
    set str [string map [list + { } "\\" "\\\\"] $str]

    # prepare to process all %-escapes
    regsub -all -- {%([A-Fa-f0-9][A-Fa-f0-9])} $str {\\u00\1} str

    # process \u unicode mapped chars
    return [subst -novar -nocommand $str]
  }

This is almost exactly source taken from the implementations of http and ncgi.

Lars H: This encodes a string of bytes using printable ASCII, but what can/should be done if one wants to use arbitrary Unicode characters in a string? I suppose that question mostly boils down to "which encoding is used for URLs?" Are there any RFCs or the like that specifies that? [Yes, so that's one job: find the correct name of this translation ("x-url-encoding") and its official specification.]


18may05 jcw - With 8.4, this ought to do the same:

  proc ue_init {} {
    lappend d + { }
    for {set i 0} {$i < 256} {incr i} {
      set c [format %c $i]
      set x %[format %02x $i]
      if {![string match {[a-zA-Z0-9]} $c]} {
        lappend e $c $x
        lappend d $x $c
      }
    }
    set ::ue_map $e
    set ::ud_map $d
  }

  ue_init

  proc ue {s} { string map $::ue_map $s }
  proc ud {s} { string map $::ud_map $s }

  puts [ue "wiki.tcl.tk/is fun!"]
  puts [ud [ue "wiki.tcl.tk/is fun!"]]
  puts [ue "a space and a \n new line :)"]
  puts [ud [ue "a space and a \n new line :)"]]
  puts [ud "1+1=2"]

[Is http now coding this command in C? Is that what it'll do in the future?]