http , a package bundled into Tcl as part of the Official library of extensions, is a client-side implementation of the HTTP/1.1 protocol.
Version 2.8.5, with full HTTP/1.1 support, is distributed with Tcl 8.6.
Version 2.7, with partial HTTP/1.1 support, is distributed with Tcl 8.5.2.
Version 2.5.3 is distributed with Tcl 8.4.18.
An early proposal for http 2.5 (with some http 1.1 support) occurred in the TclSOAP project - see proposed version 2.5 . The tclvfs project extended that to a proposed version 2.6 (with the -method option to allow callers to use WebDAV methods). This change was merged into the TclSOAP project version (2003-06-23); the -method option was added to the official http before version 2.7. Both TclSOAP and tclvfs now use the official http package bundled with Tcl, although "http 2.6" remains as unused code in the tclvfs source tree.
::http::size is the number of bytes of HTML that geturl has returned. geturl -validate 1 returns the metadata about the page, and since no html has been retrieved, ::http::size returns 0. In this case $state(totalsize) can be used.
One nice feature of the http package is the support of different http transport protocols via the command:
::http::register proto port command
The initial setting for http itself is as if the following command were issued:
::http::register http 80 ::socket
This can be expanded for HTTPS with the tls package:
package require tls ::http::register https 443 ::tls::socket
For websites that have disabled support for SSL, including version 3, the following should work:
::http::register https 443 [list ::tls::socket -tls1 1]
It is also possible to overwrite the normal http transport protocol. For example, to get support for multiple internet/ethernet interfaces in a server that has more than one network card or uses aliased IP addresses ([L3 ]), register another version of http:
set myIP 192.168.10.1 ::http::register http 80 [list ::socket -myaddr $myIP]
TR: which just expands the initial behaviour.
Silas: Here is probably the easiest example about how to POST HTTP data using the http package:
package require http set url <your url comes here> ::http::geturl $url -query [::http::formatQuery field1 value1 field2 value2 field3 value3]
David Welton gives examples of POSTing HTTP data (that is, use of -query) in [L4 ], comp.lang.tcl, 2002-01-18
RS: Minimal downloader to stdout:
package require http puts [http::data [http::geturl [lindex $argv 0]]]
Bruce Hartweg offers this (slightly paraphrased) minimal to-file version:
package require http http::geturl $theURL -channel [open $theFile w]
along with observations that a more robust version will check for redirects, close channels, http::cleanup, ...
DKF: To get the title of a webpage, use this:
package require http set token [http::geturl $theURL] regexp {(?i)<title>([^<>]+)} [http::data $token] -> title http::cleanup $token puts "Title was \"$title\""
If you're doing more than getting the title, use tdom and not [regexp] for the parsing...
package require http package require tdom set token [http::geturl $theURL] set doc [dom parse [http::data $token]] set title [[$doc selectNodes {/html/head/title}] asText] $doc delete http::cleanup $token puts "Title was \"$title\""
A sample of catching an error when attempting to get a WWW page:
proc t url { if {[catch {set tok [::http::geturl $url]} msg]} { puts "oops: $msg" } else { return $tok } puts leaving }
DGP: It's a simple thing, but I've found use of Tcl's http package the simplest way to discover what Content-Type an HTTP server is sending back with the resource.
RS: me too, when playing HTTP
tonytraductor: I've used http to build a crossposting blog client (see http://tonyb.us/xpost ) that posts to wordpress, livejournal, tumblr, friendica, and others.
An example, send a post to tumblr:
# where .txt.txt is a text widget, # tags, title and other parameters set with tk::entry widgets in the gui ############################################ # post to tumblr proc tbpost {} { set ptext [.txt.txt get 1.0 {end -1c}] set login [::http::formatQuery mode login user $::email password $::tpswd ] set log [http::geturl http://www.tumblr.com/api/authenticate -query $login] set post [http::formatQuery mode postevent auth_method clear email $::email password $::tpswd type regular generator Xpostulate tags $::tags title $::subject body $ptext] set dopost [http::geturl http://www.tumblr.com/api/write -query $post] set mymeta [http::meta $dopost] set mystat [http::status $dopost] set length [http::size $dopost] toplevel .rsp wm title .rsp "Post Status" grid [tk::label .rsp.lbl -text "Tumblr says: $mystat\nPost length: $length"] grid [tk::button .rsp.view -text "View Journal" -command { set turl "http://$::tname.tumblr.com" exec $::brow $turl & }]\ [tk::button .rsp.ok -text "DONE" -command {destroy .rsp}] }
Today I'm trying to get it working with posterous, however, and having difficulty.
LES: is not superstitious and asks a question on 2004-08-13: What if the download is too large? How is it possible to... er... "cache" the download, i.e. save part of the stream and free up memory?
schlenk: The http geturl method has various options for this special case. Either you give a channel, so the data is written directly to a file for example, or you register a special progress callback to deal with the situation.
Peter Newman 2004-03-08 : Resuming? Does anyone know if it's possible to resume (MP3 downloads) with [http]. And if so, how? And if it's not possible to resume with http, could you let me know that too. (So I don't have to waste time on a lost cause.) Thanks.
schlenk: It is possible if the http server supports range requests and you know the length of the file from the content length headers. You just need to add the appropriate HTTP header fields when doing the request, see the RFC 2616 3.12 [L5 ].
Identification and handling of proxies can be a pain when using the http package so I'm trying to write a package to handle as much of this as possible - see autoproxy
Complaint: http blocks while resolving non-existent and disconnected server.
DGP: This complaint maps to the complaint that [socket] blocks in the form [socket $host $port] when $host does not exist/respond, even when the -async option is used. This basically further maps into a complaint that gethostbyname() blocks. Other C programs apparently have non-blocking solutions for this. We should discover what those solutions are and see if the [socket] implementation can make use of them. Andreas Kupries' memory is that we collectively decided the best solution is "to have the core spawn a helper thread to process (and wait for) gethostbyname() while the rest of the core goes on crunching."
Darren New observes that gethostbyname() can't be trusted to be thread-safe ...
NOTE: (hint) I don't see this problem with socket reported as a bug at SF.
TV: It's been a while for me, but isn't that inet function in fact opening a (maybe udp) socket to a DNS, which could be select()-able on decent systems?
nl: until this will be fixed you can use the tcllib::dns package to do the dns lookup and then http to the ip (note that this have some implications such as assuming that your DNS host responds and sending a wrong Host header by the http lib, but it is usually better then having your application hang on a bad dns entry).
PT 2003-06-23: This all assumes that DNS is what is being used. However, there are various ways to resolve hostnames and the local C library resolver knows how to handle them according to local configuration. Maybe we are using a hosts file, maybe we have NIS. It is unfortunately not as simple as this appears - otherwise we'd have fixed it. Ultimately using an external process to do the lookups ala netscape's resolver proxy is likely the only way to avoid this delay.
TV: A separate process would leave you with the delay, which is when you don't have the answer to the query readily available, wait for better alternatives or needed correction, or until your connections to the informing party are no longer cluttered or broken, but at least you could do something else in the meanwhile. A major normal reason for having processes or threads in the context of communication pacing things.
DKF: A separate process would let you do other things while the delay was happening. You could even keep a pool of helper processes around and use them round-robin fashion.
How can you catch an error in a callback? e.g., if I call
http::geturl $url -command somecommand
any errors raised in somecommand just vanish instead of being passed to bgerror as I expect.
PYK 2016-04-03: Yes, http::Finish does swallow errors in the -command if an error is already being propagated. The whole http module needs a little redesigning. In the meantime, the callback command can be "liberated" with something like this:
http::geturl $url -command [list after idle some_command]
HolgerJ 2016-04-23: Could the error I am encountering be related to this?
::http::cleanup $token ::thread::release Error from thread tid0x7f80f6893700 can not find channel named "sock7f80f0178c50" while executing "eof $sock"
As soon as I put a after 500 between the two statements, the error doesn't show. Could it be that the cleanup is being overtaken by the release, so that the cleanup cannot find the $sock anymore?
pyk 2016-04-23: Do these lines appear in a -command script?
Here is some code recently mentioned on news:comp.lang.tcl for querying whether a site is alive.
if {$argc == 0} { set site http://purl.org/thecliff/tcl/wiki/ } else { set site [lindex $argv 0] } package require http 2.3 # this proc contributed by [Donal Fellows] proc geturl_followRedirects {url args} { while 1 { set token [eval [list http::geturl $url] $args] switch -glob [http::ncode $token] { 30[1237] {### redirect - see below ###} default {return $token} } upvar #0 $token state array set meta [set ${token}(meta)] if {![info exists meta(Location)]} { return $token } set url $meta(Location) unset meta } } set token [geturl_followRedirects $site -validate 1] if {[regexp -nocase ok [::http::code $token]]} { puts "$site is alive" } else { puts "$site is dead: [::http::code $token]" } ::http::cleanup $token
HaO 2013-05-02: IMHO it would be more secure to limit the redirections to 5.
Tcl/Tk 8.5.2 Release Candidates Options (new behaviour with http -handler) , comp.lang.tcl, 2008-03-28: discusses a problem with http version 2.7.
TV 2003-04-24:
I just found behaviour I didn't get:
(Tcl) 68 % info vars http::* ::http::urlTypes ::http::http ::http::1 ::http::alphanumeric ::http::encodings ::http::formMap ::http::defaultCharset (Tcl) 68 % unset ::http::1 can't unset "::http::1": no such variable (Tcl) 69 % info vars http::* ::http::urlTypes ::http::http ::http::alphanumeric ::http::encodings ::http::formMap ::http::defaultCharset
It's wish 8.4.1, and it runs bwise, a webserver (tclhttpd with some alterations), and this is clearly from the http package to fetch a webpage. Maybe the manual gives a neat answer, I just found it noteworthy that an erroneous unset still seems to do its unsetting.
RS: ..or that the variable was removed by the web server between the first two commands? What happens if you just call the first command repeatedly?
TV: It would seem to be stable. It's the page content and url info etc array variable, which sticks around it seems until deleted, that's the whole reason I was looking for some garbage collection, or delayed freeing. It could be there is an event linked with some element, I don't know, I didn't write the at least handy http package...
HaO 2013-05-10: Here is my http(s) link (url) verification code, as inspired from the upper example from Kevin Kenny. This code follows max 5 forwards and requires tcl8.6 due to the tailcall:
proc ::linkCheck {urlIn {timeout 10000} {recursionLimit 5}} { if {[catch { set requestHandle [::http::geturl $urlIn -validate 1 -timeout $timeout] } err]} { return -code error [mc "Unknown host '%s'" $urlIn] } set fError 1 if {[::http::status $requestHandle] ne {ok}} { set errMsg [::http::status $requestHandle] } else { switch -glob -- [::http::ncode $requestHandle] { 2* {set fError 0} 30[12378] { # redirect if {0 < $recursionLimit && [info exists ${requestHandle}(meta)] && [dict exists [set ${requestHandle}(meta)] Location] } { incr recursionLimit -1 set url [dict get [set ${requestHandle}(meta)] Location] ::http::cleanup $requestHandle tailcall ::linkCheck $url $timeout $recursionLimit } } } set errMsg [::http::code $requestHandle] } ::http::cleanup $requestHandle if {$fError} { return -code error [mc "Error '%s' accessing url '%s'" $errMsg $urlIn] } return }
oehhar - 2017-08-31 12:19:29
HaO 2017-08-31: When getting a ncode of 100, you are probably hitting bug 2a94652e in the http package shipped with tcl 8.6.0 - 8.6.7 (http package 2.8.0 - 2.8.11).
Use 2.8.12 which is shipped with tcl 8.6.8.
HaO 2020-09-02: The http package has an internal routine to convert a charset parameter passed to a content-type header to a TCL encoding name.
This may also be used if an IANA charset [L6 ] should be converted to a tcl encoding.
Of cause, this is not an official API and thus, it may change.
The command is:
http::CharsetToEncoding $Charset
Snipets from http itself using it:
From Line 2676 of http 2.9.1, the charset is extracted from the content-type header:
if {[regexp -nocase \ {charset\s*=\s*\"((?:[^""]|\\\")*)\"} \ $state(type) -> cs]} { set state(charset) [string map {{\"} \"} $cs] } else { regexp -nocase {charset\s*=\s*(\S+?);?} \ $state(type) -> state(charset) }
From Line 3210 of http 2.9.1, the charset value is used to recode the data:
set enc [CharsetToEncoding $state(charset)] if {$enc ne "binary"} { set state(body) [encoding convertfrom $enc $state(body)] }
HaO 2020-09-02: Here is an example how to post utf-8 encoded data:
set data "ABCDÄÖÜ\u2022" set h [http::geturl http://sample.org/postutf8\ -query [encoding convertto utf-8 $data]\ -type "text/plain;charset=utf-8"]
The IANA name of the encoding is passed with the -type parameter.
The same scheme applies for any other encoding. Remark, that ISO-Latin 1 is the default encoding:
set data "ABCDÄÖÜ" set h [http::geturl http://sample.org/postutf8\ -query [encoding convertto iso8859-1 $data]\ -type "text/plain"]