* See: http://www.tcl.tk/man/tcl8.5/TclCmd/http.htm '''QUESTION:''' Does someone know if it's allowed to use upper OR lower characters in CGI parameter encoding? E.g., are the following two results equivalent?: % http::formatQuery äöü %c3%a4%c3%b6%c3%bc % % http::formatQuery2 äöü %C3%A4%C3%B6%C3%BC % I read a few lines of RFC 3875 but did not fully understand everything yet... [MJ] - The URI escaping is described in RFC 2396 (which is referenced by the CGI RFC 3875). There it states: An escaped octet is encoded as a character triplet, consisting of the percent character "%" followed by the two hexadecimal digits representing the octet code. For example, "%20" is the escaped encoding for the US-ASCII space character. escaped = "%" hex hex hex = digit | "A" | "B" | "C" | "D" | "E" | "F" | "a" | "b" | "c" | "d" | "e" | "f" So yes they are equivalent. [Mat] (2009-10-27) Though they should be equivalent, many implementations don't recognize either of those variants. As it turns out, even Amazon's Product Advertising API requires query strings to use upper case character triplets.. This proc will convert them: ====== proc formatQuery2 {val} { return [subst [regsub -all {(%.{2})} [::http::formatQuery $val] {[string toupper "\1"]}]] } ====== [MHo] 2010-03-06: Confusion: ====== % info pa 8.5.8 % package require http 2.7.5 % http::formatQuery umlaut1 ä umlaut2 ö umlaut3 Ü umlaut1=%c3%a4¨aut2=%c3%b6¨aut3=%c3%9c % package require ncgi 1.3.2 % ncgi::encode äöÜ %E4%F6%DC ====== Does http::formatQuery produce the right encoding for german "Umlauts" here??? [Lars H], 2010-03-07: Looks like http::formatQuery is using utf-8, whereas ncgi::encode is using iso8859-1 (a.k.a. binary). AFAIK, an HTTP URL is just an octet-sequence, so there is no "default encoding" — providing encoding information requires a higher level mechanism, so it probably depends on who you're talking to. I notice Google search queries tend to contain substrings `&ie=UTF-8&`, so that is probably how they do it. Others might do it differently. [http://www.w3.org/TR/html4/interact/forms.html#h-17.13] seems to indicate that only ASCII is valid for HTML form data. A guess would be that ncgi::encode expects its caller to handle encoding conversions (try it with a non-iso8859-1 character such as ğ), whereas http::formatQuery has the assumption of utf-8 built-in. ---- !!!!!! %| [Category Command] that is part of the [http] package of [Tcl] |% !!!!!!