[Keith Vetter] 2004-06-18: One feature lacking in the http package is the ability to automatically handling redirects. A redirect occurs when an http server returns a code in the 301-307 range and indicates in the metadata the new location (url) to download. At least three different wiki pages have routines to handle redirects ([Http], [Simple Tkhtml web page displayer] and [grabchat]) including a nice, succinct version by [Donal Fellows]. However, a web-scraping program of mine that used this routine recently started failing. It turns out the url being redirected to contained no HOST information. (NB. the server is question is Yahoo and I wouldn't be surprised if they did this to discourage scraping their site.) So, here's an updated version of [Donal Fellows] routine '''geturl_followRedirects''' that can handle this weird case. ---- package require uri proc geturl_followRedirects {url args} { array set URI [::uri::split $url] ;# Need host info from here while {1} { set token [eval [list http::geturl $url] $args] if {![string match {30[1237]} [::http::ncode $token]]} {return $token} array set meta [set ${token}(meta)] if {![info exist meta(Location)]} { return $token } array set uri [::uri::split $meta(Location)] unset meta if {$uri(host) == ""} { set uri(host) $URI(host) } # problem w/ relative versus absolute paths set url [eval ::uri::join [array get uri]] } } ---- [MAK] Notes that function is not safe against infinite looping redirects (as might happen, for example, if a server is set up with an ErrorDocument page but is misconfigured such that it is forbidden as well). ---- [Category Internet]