The DNS blocking problem

The problem is that the standard resolver library blocks while looking up names. This can cause a Tcl application - and more obviously a Tk application - to hang while the name is resolved. Typically system resolvers use DNS to convert a name to an address, but the system may also use files, NIS, LDAP and other systems.

One solution is to force the use of DNS. The tcllib library has a dns module that can perform DNS queries using pure-Tcl over tcp and over udp if tcludp is available.

Another solution is to fire up a slave process to perform name resolution and use non-blocking communications with the slave. This is the approach used in Netscape and BrowseX. The advantage of this method is that the resolver process will use normal resolver library calls and block as normal, while the parent can continue processing events while waiting for the answer to arrive.

I (PT) have a loadable package that uses this approach to perform non-blocking name resolution for Windows (though this can obviously be extended to other platforms). At the moment this package just creates a resolve command and then talks to the slave. Testing this gives me a Tk app then continues to process events and update while waiting for slow responses (ie: DNS queries for non-existant hostnames.) See [L1 ] for the files.

There is only one function in the Tcl core that actually uses gethostbyname on Windows - this is the CreateSocketAddress function in tclWinSock.c. Unix tcl has a similar function but also uses this library function to obtain the nodename. To make non-blocking name resolution a seamless option it will be necessary to provide a way to register an alternative implementation of this function via a loaded package.

Comments.....


RT 11 Nov 2004, Here is a little combo that works for me.

1. Put this code in a file: sockcheck.tcl

proc sockcheck {sock port} {
    # don't exit non-zero or could get error
    # on caller process close
    if {[catch {set sock [socket $sock $port]}]} {
        puts 0
        exit 0
    }
    close $sock
    puts 1
    exit 0
}
sockcheck [lindex $argv 0] [lindex $argv 1]

2. Use the following proc in the code that should not be blocked. If you want to be fancier you could user readable callbacks to avoid any blocking at all. This scheme is practical enough for my needs.

proc runSockCheck {ip port seconds} {
    # Use another process to check for connection - we don't
    # get blocked here for any longer than we choose to
    set ch [open "|tclsh84.exe sockcheck.tcl $ip $port" r]
    fconfigure $ch -blocking 0
    set ms [expr {$seconds*1000}]
    for {set ms2 0} {$ms2 < $ms} {incr ms2 200} {
        set try [gets $ch]
        if {$try eq ""} {
            update
            after 200
            continue
        }
        close $ch
        if {$try == 1} {
            # success
            return 1
        } elseif {$try == 0} {
            # Failure
            return 0
        }
    }
    # Timed out
    # Can't use plain close here because it blocks until child
    # exits which is exactly what we're trying to avoid.
    # Close it 5 minutes later
    after [expr {1000*300}] "catch {close $ch}"
    return 0
}

3. Call runSockCheck on any ip/port combination and only call socket if the check is successful.


[DJB implementation: http://cr.yp.to/ ]


DKF - The problem is that DNS is not the only way of resolving host names to IP addresses. I've also seen NIS+ and LDAP used for this purpose, and there's also hosts files to think about. Plus it is a really good idea to follow the local policy on this matter, as that is sometimes set for technical reasons. To cut a long story short, there's a great deal more complexity here than you might naively expect.

So what? Well, the problem is that the library call that interfaces to all this (gethostbyname()) is synchronous, and way too complex for us to safely make assumptions about it (I know that the configuration file for this on Solaris is not actually a config file, but a Shared Library to load for the purpose.) This sucks. This sucks a lot. The only ways to do asynchronous name resolution on UNIX are to use a separate thread or a separate process.

NEM - I believe BrowseX takes this route. IIRC, it has a separate executable to do the DNS stuff which it execs in the background.


Stu - Calls to gethostbyname() can be skipped in Ceptcl with the -noresolve option.


DKF 26-Apr-2005: Joe English suggested tn the Tcl Chatroom that using getaddrinfo() instead of gethostbyname() would make farming out DNS lookups across threads much easier (since it is reentrant).

See also: gethostbyname as Windows DLL


APN 21-Jun-2006: TWAPI V0.9 can do non-blocking name resolution using the hostname_to_address [L2 ] and address_to_hostname [L3 ] -async options.


KaKaRoTo 14-Oct-2008: The aMSN team just recently wrote an async resolver for DNS which seems to work fine and removes the DNS blocking problem.. have a look at it here : https://amsn.svn.sourceforge.net/svnroot/amsn/trunk/amsn/utils/asyncresolver/

Lars H: Despite the aMSN team's reputation of sometimes achieving miracles in pure Tcl, this is a compiled extension, together with a Tcl wrapper that patches socket. The idea for the compiled extension seems to be to create a new thread that does the call to resolve the address, thus avoiding blocking the thread your Tcl interp lives in.

CMcC: it just occurred to me that passing in a dotted quad will allow socket to proceed without blocking. Since it's completely possible to do dns in pure tcl, it is possible to avoid blocking on opening a socket (except of course one would probably block on opening the socket to the DNS server) at the cost of some system dependencies in determining the DNS for a given installation. Still, it seems that it's possible to avoid the blockage, in principle.

KaKaRoTo Yes Lars, it's a compiled extension with a wrapper to socket, we just didn't see how we could do it otherwise. Of course CMcC's solution of doing pure Tcl DNS resolution would work, but as pointed out before, there's the NIS+, LDAP, and hosts file problems and I prefer to stay as much as possible compatible with what the system does. Anyways, I just wanted to point out a solution if anyone is interested in a compiled extension+wrapper (would be nice if Tcl did it all for us when you specify -async).