Version 8 of http::formatQuery

Updated 2010-03-08 04:11:59 by AMG


Does someone know if it's allowed to use upper OR lower characters in CGI parameter encoding? E.g., are the following two results equivalent?:

 % http::formatQuery äöü

 % http::formatQuery2 äöü

I read a few lines of RFC 3875 but did not fully understand everything yet...

MJ - The URI escaping is described in RFC 2396 (which is referenced by the CGI RFC 3875). There it states:

   An escaped octet is encoded as a character triplet, consisting of the
   percent character "%" followed by the two hexadecimal digits
   representing the octet code. For example, "%20" is the escaped
   encoding for the US-ASCII space character.

      escaped     = "%" hex hex
      hex         = digit | "A" | "B" | "C" | "D" | "E" | "F" |
                            "a" | "b" | "c" | "d" | "e" | "f"

So yes they are equivalent.

Mat (2009-10-27) Though they should be equivalent, many implementations don't recognize either of those variants.

As it turns out, even Amazon's Product Advertising API requires query strings to use upper case character triplets.. This proc will convert them:

proc formatQuery2 {val} {
    return [subst [regsub -all {(%.{2})} [::http::formatQuery $val] {[string toupper "\1"]}]]

MHo 2010-03-06: Confusion:

% info pa
% package require http
% http::formatQuery umlaut1 ä umlaut2 ö umlaut3 Ü
% package require ncgi
% ncgi::encode äöÜ

Does http::formatQuery produce the right encoding for german "Umlauts" here???

Lars H, 2010-03-07: Looks like http::formatQuery is using utf-8, whereas ncgi::encode is using iso8859-1 (a.k.a. binary). AFAIK, an HTTP URL is just an octet-sequence, so there is no "default encoding" — providing encoding information requires a higher level mechanism, so it probably depends on who you're talking to. I notice Google search queries tend to contain substrings &ie=UTF-8&, so that is probably how they do it. Others might do it differently. [L1 ] seems to indicate that only ASCII is valid for HTML form data.

A guess would be that ncgi::encode expects its caller to handle encoding conversions (try it with a non-iso8859-1 character such as ğ), whereas http::formatQuery has the assumption of utf-8 built-in.

AMG: I discovered during the development of Wibble that HTTP 1.1 uses the ISO8859-1 encoding unless explicitly overridden with Content-Type: headers [L2 ].

Category Command that is part of the http package of Tcl