[Keith Vetter] 2004-06-18: One feature lacking in the http package
is the ability to automatically handling redirects. A redirect
occurs when an http server returns a code in the 301-307 range and
indicates in the metadata the new location (url) to download.

At least three different wiki pages have routines to handle
redirects ([Http], [Simple Tkhtml web page displayer] and
[grabchat]) including a nice, succinct version by [Donal Fellows].

However, a web-scraping program of mine that used this routine
recently started failing. It turns out the url being redirected
to contained no HOST information. (NB. the server is question
is Yahoo and I wouldn't be surprised if they did this to discourage
scraping their site.)

So, here's an updated version of [Donal Fellows] routine
'''geturl_followRedirects''' that can handle this weird case.

----
 package require uri
 proc geturl_followRedirects {url args} {
    array set URI [::uri::split $url] ;# Need host info from here
    while {1} {
        set token [eval [list http::geturl $url] $args]
        if {![string match {30[1237]} [::http::ncode $token]]} {return $token}
        array set meta [set ${token}(meta)]
        if {![info exist meta(Location)]} {
            return $token
        }
        array set uri [::uri::split $meta(Location)]
        unset meta
        if {$uri(host) == ""} { set uri(host) $URI(host) }
        # problem w/ relative versus absolute paths
        set url [eval ::uri::join [array get uri]]
    }
 }
----
[MAK] Notes that function is not safe against infinite looping redirects (as might happen, for example, if a server is set up with an ErrorDocument page but is misconfigured such that it is forbidden as well).
----

[Category Internet]