'''[http://www.tcl.tk/man/tcl/TclCmd/http.htm#M33%|%::http::formatQuery]''' is a command provided by the [http] package. ** Inverse Operation ** [ncgi] provides the inverse operation: ====== ncgi::input $query ncgi::nvlist ====== ** CGI Parameter Encoding Character Case ** Does someone know if it's allowed to use upper OR lower characters in CGI parameter encoding? E.g., are the following two results equivalent?: ======none % http::formatQuery äöü %c3%a4%c3%b6%c3%bc % % http::formatQuery2 äöü %C3%A4%C3%B6%C3%BC % ====== I read a few lines of [RFC] [http://www.ietf.org/rfc/rfc3875.txt%|%3875] , but did not fully understand everything yet... [MJ]: The URI escaping is described in [RFC] [http://www.ietf.org/rfc/rfc2396.txt%|%2396] , which is referenced by the CGI RFC 3875. There it states: An escaped octet is encoded as a character triplet, consisting of the percent character "%" followed by the two hexadecimal digits representing the octet code. For example, "%20" is the escaped encoding for the US-ASCII space character. escaped = "%" hex hex hex = digit | "A" | "B" | "C" | "D" | "E" | "F" | "a" | "b" | "c" | "d" | "e" | "f" So yes they are equivalent. [Mat] 2009-10-27: Though they should be equivalent, many implementations don't recognize either of those variants. As it turns out, even Amazon's Product Advertising API requires query strings to use upper case character triplets.. This proc will convert them: ====== proc formatQuery2 val { return [subst [regsub -all {(%.{2})} [ ::http::formatQuery $val] {[string toupper \1]}]] } ====== [PYK]: If `$val` can contain `[[` or `$` , `formatQuery2` would be vulnerable to [Injection Attack%|%injection attacks] . ** Default URL encoding ** [MHo] 2010-03-06: Confusion: ====== % info pa 8.5.8 % package require http 2.7.5 % http::formatQuery umlaut1 ä umlaut2 ö umlaut3 Ü umlaut1=%c3%a4¨aut2=%c3%b6¨aut3=%c3%9c % package require ncgi 1.3.2 % ncgi::encode äöÜ %E4%F6%DC ====== Does http::formatQuery produce the right encoding for german "Umlauts" here??? [Lars H], 2010-03-07: Looks like http::formatQuery is using utf-8, whereas ncgi::encode is using iso8859-1 (a.k.a. binary). AFAIK, an HTTP URL is just an octet-sequence, so there is no "default encoding" — providing encoding information requires a higher level mechanism, so it probably depends on who you're talking to. I notice Google search queries tend to contain substrings `&ie=UTF-8&`, so that is probably how they do it. Others might do it differently. [http://www.w3.org/TR/html4/interact/forms.html#h-17.13%|%HTML 4%|%] seems to indicate that only ASCII is valid for HTML form data. A guess would be that ncgi::encode expects its caller to handle encoding conversions (try it with a non-iso8859-1 character such as ğ), whereas http::formatQuery has the assumption of utf-8 built-in. [AMG]: I discovered during the development of [Wibble] that HTTP 1.1 uses the ISO8859-1 encoding unless explicitly overridden with Content-Type: headers [http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.7.1%|%rfc 2616%|%]. [Lars H]: That concerns the ''body'' content, though. http::formatQuery is more about constructing the URL, which (I believe) cannot rely on headers for its interpretation, is it not? <> Command