Version 32 of Random Musings on Tcl vs Perl Network Programming

Updated 2014-04-03 20:38:18 by pooryorick

Perl has access to socket programming primitives. You can essentially do everything you can do in C (with often the same function names).

I had to do some hardcore (200+ 6KB UDP messages/sec across 6 sockets) networking. It had to run 24x7 and there was no room for memory leaks. Naturally I prototyped it in Tcl (plus TclX and the TclUDP extension).

But Tcl is slow. To satisfy a caching requirement, I queued the messages in a Tcl list and used TclX's lvarpush and lvarpop to store/fetch the messages. That was an order of magnitude slower than the same algorithm in Perl (using push and shift on arrays).

Still, the Perl app wasn't any faster. The network is the bottleneck. But, more interestingly, I had to implement some common Tcl idioms (involving fileevent, after and non-blocking/non-buffering puts) in Perl. And, that is where I learned how well-thought-out the Tcl I/O system was.

As a side note, I added a very slim udp_read C function to my app to compare it with going through the channel based Tcl read proc (as TclUDP does). Not much performance improvement was gained by doing that.

In the end, Perl did allow me to tweak a few more things (but I could easily do that with a couple of Tcl extensions). Perl networking is much richer (without having to go to C). However, after struggling with partial writes (using syswrite) and mucking with IO::Select to get it to act like the Tcl event loop, I've come to appreciate the things Tcl does for me.

The lesson learned: Perl has better low level networking access, but the Tcl code contains a lot of built-in idiomatic functionality that you'll find yourself hand-coding to duplicate in Perl (and to less effect). See POE.

-- Todd Coram Once again, it looks like my Tcl prototype will make it into production without the often obligatory C rewrite!

jcw - Imagine having the best of both worlds: a 1-to-1 mapping of C system calls to Tcl, with the event system coded in Tcl... It could tie in nicely with a critical mindset about policy, and make such a Tcl system considerably more modular (package require sockets?). It also means someone who needs UDP could add it in Tcl, and instantly have it work on all platforms with the same system call interface. It would not rule out a more efficient C-coded implementation, it would just provide more choices and portability.

DKF: I think that's one of your less well-baked suggestions jcw. :^) The good thing about Tcl is that it hides all the crap. Expose it, and many people will insist on using it (because they're mindlessly reading off how to do it in some out-of-date networking bible) no matter how much there are better APIs available. And the main problem with UDP is that it doesn't fit with Tcl's channel abstraction; AKU's been looking into this. (As always, external packages can do whatever they want.)

Todd Coram: Sometimes I need access to the crap. What I really want is a backdoor (without relying on the crutch of dropping down to coding C extensions everytime). Maybe a package require posix or package require tcp? I think Tcl is too slow to efficiently manipulate a low level api. This isn't a criticism though. Most of the time I just need Tcl's high level of I/O abstraction. What I (probably) really need is fconfigure on steroids. Sometimes I need to tweak socket/TCP parameters... The problem with fitting UDP into the channel abstraction is that it isn't a natural fit. It's a forced fit. Just like unix making read(2) deal with UDP. UDP isn't a streaming protocol. The one unifying abstraction tries to force the one true way with abstracting a family of similiar facilities (which usually leaves the edge behaviors/neeeds unsatisfied).

jcw (in response to Donal): Point taken, and agree 100% on potential for abuse. But I really would like to lobby for a more "systems scripting" kind of mindset. We currently avoid the problem by using some obscure language with an even more obscure single-letter name to prevent mainstream application developers from messing with deep stuff. Can't we get rid of this detour, have a core exposed to "inner Tcl", have a few systems-programmer type people design and build the next layer, and then call the result Tcl or Tcl NG or outer Tcl? The current approach comes from the time when C was the center of the universe and Tcl a neat idea built on top. Isn't it time to make C revolve around Tcl instead? And optional? Today, we have a pretty unforgiving brick wall - if you want anything more than Tcl caters for, then you must dig C first and dig the Tcl C core second! Yeah right, come back in a year when you're done... :)

There's also a different argument to be made: natural selection. If a lower level core is exposed as Tcl building blocks, then a few more developers would be able to experiment and come up with say a UDP system (you have to be pretty experienced at this stuff to actually pull it off, so I think we won't be seeing dozens of competing solutions any day soon). One of them is bound to hit the sweet spot between the channel model and events. And that one could then be picked up and given a more prominent status, as *THE* udp package for Tcl. Or re-implemented in C, if you insist.

DKF (responding to Todd): I'd love to see more fconfigure options for sockets, just as I would love to see support for UDP with a suitable messaging paradigm. (And support for Unix-domain sockets too, while I'm at it!) I just haven't got time to do the work to make it happen as my pay-work is pretty heavy going at the moment. :^(

DKF (responding to jcw): Sounds really cool to me (like the vast majority of your ideas.) It'd have to be a Tcl9 thing since you'd be requiring alterations to existing scripts that make use of sockets, but that's not too big a deal IMHO.

TV: About the jcw remark, agreeing with the idea of thinking about progamming and what that means, An 'inner core' of Tcl sounds like a great idea of some kind, but what does that mean? A library of functions which can be called at the Lisp level? All interactions via an interpreter? A Tcl version which can be used to organize all system ongoings? Get real, the low level system 'things' would never run at a speed even comparable to what we are used to. And C isn't so much the only thing at system level, it's the system itself which comes first, unless you maybe want to build another dedicated smalltalk machine, maybe even parallel, which is fine and interesting project, worthy of science funding, but not exactly a mainstream solution to fuzzy/hard/hidden/secret/unknown/undocumented/motherf*ed all over the place system level routines.

And unless you make your own lets say Unix or Apple machines, what you have to do with, even there, would be a ton of drivers for all kinds of cards and optional devices and machine configurations, and you'd start in driver hell.

The soft spot or something, lets see, 'the sweet spot between the channel model and event', that would be the select() call, I'm sure, which is the lowest level application level C interface, which mainly assigns a bit (how atomic can you get) to each channel opened and signifies whether the corresponding channel can be read from written to or contains an error. That is where more or less I'm sure the even thandlers hook into, and events are spawned, except that under maybe windows or the mac there are other programming interfaces to deal with e.g. windows events, I don't know that, but the essence is that I wouldn't dream of using a general interpreted language to even at that level let the system stuff be handled at the millisecond level or even lower, let alone let the disc interupts and the network packet handler first go through an interpreter layer, that simply doesn't make sense or isn't practically interesting.

Also, I wonder what the interest for UDP essentially is about, that would be about routing sort of stuff, isn't it, I mean the efficiency of using streams is fine enough, so why take the risk of going over an insecure layer like udp's when there is no need? For undirected packages like the finding out network servers and routing as part of a network connecting node, but for normal programming the purpose would probably be limited to a 'broadcast' type of idea, and possibly organizing an efficient (and sensible) multicast over a local network.

I repeatedly see various interesting Tcl/Tk packages/extension which would allow acces at Tcl level to do sensible things, a good range of those and maybe some de facto standards is valuable.

Someone write a Tcl rpm and kernel compile and tweak package with a nice Tk interface to tune a realtime linux kernal to sub millisecond adio device interaction time, preferably networked?...

DKF: Computers are so fast these days that scripting languages like Tcl can quite easily work with things at the millisecond level. :^)

On the matter of UDP, I'll first note that UDP is not really inherently any more or less secure than TCP. However some protocols are primarily defined in terms of it (e.g. if you really want to work with NTP or DNS, you probably ought to handle UDP as that's what those protocols normally use for their transport layers) and if you're doing something like streaming multimedia you need UDP because you'd rather drop parts of the stream than wait for TCP to retransmit them.

TV The millisecond level doesn't leave you with all to much space to really do something associatively much in my experience, how many empty loops with just a simple operation could one do? Not that exceedingly many per millisecond, maybe a few million? That's not much: a few hundred operations and you machine is only doing Tcl interpreting per channel! In comparison: a C program with good optimalisation would give you a few hundred and more effective operations (for low level stuff) per second: quite a difference.

UDP just means undirected, isn't it?

DKF: The U stands for user actually. :^) It's the D (for datagram) that says you're working with a non-stream protocol.

Todd Coram Regarding DFK's comment: ..can quite easily work with things at the millisecond level... , er, yes and no. Under Linux, BSD and Solaris, you get about a 10-17 millisecond granularity on OS timer events. (i.e. after 5 dosomething will not fire in 5 milliseconds (it'll fire 10 milliseconds later). I do stuff in Tcl at the millisecond granularity by chewing up clock cycles (and ignoring the event loop) sampling time with clock clicks -milliseconds. (ugh).

Regarding UDP comments: me I do lots of UDP for reasons I can't divulge... but TCP connection build up and tear down are quite expensive. When you are just firing off a few packets without much concern about 100% delivery, UDP is still useful.

Todd Coram 20013-09-12 09:00: More Perl vs Tcl musings this morning (btw, I am rewriting one of my Tcl apps in Perl to see if there are any performance enhancements -- why not just go pure C? Ugh. I've got to get this done by the end of next week ;-).

Using Perl's Sys::Syslog causes me to lose UDP packets (I can't read the UDP port fast enough). I had no such problem in Tcl. Why? A lot of Perl's standard packages are written in pure Perl. Sys::Syslog is in pure Perl. The Tcl extension I downloaded is in plain C (so is Tnm's implementation of syslog). Sometimes it is better to go with C extensions... ;-)

Todd Coram 2003-09-12 15:55: After about 1 hour of high-volume data rates and Perl starts to hemorrhage memory. I wish I could figure out why. The Tcl code has been running steady for days now. It just isn't as fast as the Perl app...

DKF 2003-09-15: I thought syslog was done through Unix-domain sockets. (I'd love to add them to socket; so many enhancements, so little time!)

Todd Coram 2003-09-15: Oh. I meant that I was dropping my other channel's UDP packet (the one with the incoming sender/client data) whenever I used Perl's Sys::Syslog. Although you can use UDP for Syslog (check out your /etc/services), my logging was done through Unix domain sockets. It's just the Perl implementation that is slow enough to allow the OS UDP buffer to overflow (is quickly filled by sending apps in a matter of milliseconds). BTW, my Perl app died (simply exited) over the weekend. Can't blame Perl completely. I am sure I am missing something. But I am following exemplar Perl networking idioms (Cookbook et al). Back to Tcl and C...

Todd Coram 2003-09-15 09:27: The Perl app leaks memory! Slowly but surely. Hmmm... I really don't want to debug this. I don't have objects. The only active datastructure is a queue (push,shift,unshift) and that's getting emptied out correctly. (Perl lurkers: I am a fan of Perl. I use Perl daily. I am not a Perl guru. I've written fairly large/complex apps in Perl. But Perl may not be the best choice for this type of work: High volume; soft-realtime networking.)

JMN 2006-11-19: Tcl has no equivalent of the perl socket shutdown mechanism. This means Tcl sockets can't be used for protocols which rely on half-close of a TCP connection to trigger a final response. Once you call close on a Tcl socket, you can't read the response.

PYK 2014-04-03: As of Tcl 8.6, channels can now be half-closed.

JMN 2006-12-01: Tcl can't retrieve the serverside IP address of a listening socket without also doing a DNS lookup. This is IMO an unfortunate dependency. i.e if you have a socket listening on all interfaces by doing something like:

socket -server ::handle_connection -myaddr 0.0.0.0 $port

How can '::handle_connection' determine what IP address the client connected to without doing a potentially time consuming DNS lookup? fconfigure $clientsock -peername returns IPaddress & *hostname* & port. That is - it automatically looks up the hostname of the server-interface the client connects to. If DNS resolution isn't working for some reason - there is a multi-second delay before fconfigure returns with the hostname set to the IP address.

Perl can determine the server-interface without a DNS operation as it doesn't bundle the two operations. e.g:

($port, $myaddr) = sockaddr_in(getsockname($sock));

and the corresponding hostname can be retrieved separately using:

gethostbyaddr($myaddr, AF_INET);

Now sure.. for most applications doing that DNS lookup is no big deal. But it's still a nuisance in some environments.


JMN 2006-12-01:

In perl, you can create a server socket then close the listening socket, whilst maintaining existing client connections. Tcl can also do that.. BUT - then you can't start up another instance of Tcl and bind to the same IP and port to accept new connections. You'll get an address already in use error. Perl can do this by setting the REUSEADDR flag. (this is a problem for a mail application I have where I want to restart an active server but not kill the existing connections)

In summary.. I think Tcl sockets are ok, but outclassed (and on windows at least, outperformed) by other languages which give access to the lower-level functionality. It's a pity, because the excellent channel & fileevent & threading systems of Tcl otherwise make it a good choice for networking work.

Given the obvious importance of networking today - it's a shame that Tcl doesn't seem to have kept up here - especially as it was a pioneer in network scriptability with Expect. To be competitive in this area now, I think it'd help to have UDP & IPv6 support in the core + more access to underlying socket functionality.

NEM: Hmm... I don't see any such problem with Tcl's sockets. After closing the server socket, I can then launch a new tclsh and start a new server on the same IP/port. What platform are you on? Here's the test scripts I used:

# process 1
proc accept {sock addr port} {
    fileevent $sock readable [list echo $sock]
    fconfigure $sock -blocking 0 -buffering line
}
proc echo sock { puts $sock [gets $sock] }
set serv [socket -server accept 8080]
after 60000 [list close $serv]
vwait forever

Run that and connect via telnet. After 1 minute, the server socket is closed, at which point I launched a new tclsh and just did:

% socket -server foo 8080
sock5

Seems to work fine. Also, regarding Windows socket performance, are you aware of IOCPSock?

JMN: interesting.. testing on an NT machine it gives the error "couldn't open socket: address already in use". I just tested on a win2000 and a FreeBSD6.1 box, and it works on those ok! Hmm.. I guess NT is pretty obsolete now, so it's not as big a deal as I thought, though it's interesting that perl doesn't have this trouble on NT.

Thanks for the script and for prompting me to retest.

I've tried IOCPSock, but haven't had a chance to compare performance. Actually I seem to be able to handle a few Mb/s using the standard socket anyway - which for my purposes is currently fine, so I shouldn't whine about that anyway :P