Version 6 of Socket Performance Analysis

by Theo Verelst

When I first was using tcl/tk to make a user interface for a suite of radiosity / ray-tracing graphics programs and simulation programs, it didn't have sockets built in yet.

So I made a little stub, on Hewlett Packard Unix, a little program I'd hook into a piped exec, which would transfer the stdin and stdout to a tcl file descriptor and the little C based program would connect to either another stub or another program via what I then called the 'connection server' (of which the code might even be lost due to dreadfull university ongoings), which would join sockets (the actual ones, by passing file descriptors) based on name-based rendez-vous.

The other program could be a C program or a existing application, such as applications allowing external control over a socket (pipe) based connection (like AVS, that was about a decade ago).

There are two main performance measures I normally come up with in this area:

ease of applying, streaming control, and absence of unpleasant boundary conditions
the normal performance association: the raw bytes per second, and how high that can be

The first has to do with setting up the socket, not trivial for many C programmers, dealing with flow control, in tcl the fileevent to deal with incoming and outgoing data streams, and the buffering, new-line and end of file behaviour.

At the operating system level, this is also important because buffering and other things are influenced by the OS.

Normally, the performance of a car would mainly be measured not just by its colour or even steering behaviour, but by its speed, which is of course important for sockets, too.

The reason the other performance measure is brought forward, is that in practice the actual performance in terms of speed is often so low, that the other issues prevail. And at times they are related, which in the time of many sort of web applications and things like Com and various Ipc methods is important, because it seems to me like many parties take realy incredibly slow and lame performance for granted without even being prompet to wonder why things work a certain way.

First a short note on the tcl sockets in terms of their usage properties.

      process A (on some.host)                    process B
      ---------                                   ---------
  set s [socket -server server_accept_proc port]
                                               set s [socket some.host port]
  fileevent $s readable ....
                                               puts $s "data"
  ...                                          ...
                                               close $s
  close $s

in principle, A and B can be the same process, but this rarely makes sense.

The dots normally are filled in with at least more fileevent statements, and the main things that bugs me a bit in some of the usage performace issues: a test for end-of-file.

The fileevent readable command will continue to execute its body when a connection is closed on the other end indefinately without explicit programmer care to test for eof, i.e. it doesn't stop once the (other side) closed file's socket has been read til the end.

An important issue in file/socket acces is the new-line behaviour: is buffering/flushing related to newline ended writes, and is it reasonably possible to retrieval all actual end of lines at the receiving end without errors in lines longer than any of the buffers.

And does process control make the receiving process active after a reasonable number of characters are send, but always after a while, even when no newline is sent or only a single character.

Lets first see what on a modern windows XP (and I'll try linux, too) is sufficient to get an operating connection, first on the same machine (i.e. just another process).

I started two wish-es, one I call server, other client, though that terminology traditionally applies to programming methods less symmetrical than what we do here.

 ###### server
 proc serv {ss ip port} {
   global cons ;
   lappend cons $ss;
   fileevent $ss readable " puts \"received data length: \[string length \[read $ss\]\]\" ; if \[eof $ss\] {close $ss}"
 }
 set s [socket -server serv 7777]


 ###### client
 set p "0123456789" ; for {set i 0} {$i < 13} {incr i} {append p $p}
 set s [socket localhost 7777]
 puts [time {for {set i 0} {$i < 1000} {incr i} {puts $s $p}}] ; close $s

This 81921000 byte transfer took about 19 seconds in this case on a 2.6 GHz recent pentium4 machine, with a movieplayer running, though.

That makes about a little over 4 megabytes per second, a seriously low figure.

KBK: Indeed, socket performance can be bad on Windows. We're working on it. Expect that the Core will at some point incorporate ideas from David Gravereaux's fine iocpsock package [L1 ] that will address some of these concerns.

TV The page wasn't finished... It's good to hear work is being done in this area, though.

On HP-UX, at the time a 50 MHz parisc, this performance, but with the right buffering and done in C would be about 25 MB/s, and I seem to remember I god much better performance before on other machines, and that should be possible on tcl.

Mind that at least this example makes read read about 80 Megabytes al at once, though the core memory is physically 512 MB, that is a lot more than any normal receive buffer is configured. So the first, reasonable, next step would be to reduce the per-read size 'til some point where the process switch overhead time is little compared to the actual transfer and send/receive function time, and buffers are used well and not required to grow beyond their normal size.

I guess it can also make a difference wether Tk is loaded. Finally, I want to see what a cygwin (and maybe, though I didn't try yet, mingw) compiled tcl does. Linux should be faster, but maybe I overestimate the speed lists can be fed into the read command. In C, I know that memcopy can work quite well when implementing sockets without system calls, also video processing at least suggests souring data transfer rates are reasonably possible, between processes.

NEM Results on SuSE Linux 8.2, running on a 1.8GHz Athlon (2200+), with 256MB DDR mem. Tcl/Tk 8.4.2 (shipped with SuSE):

 % puts [time {for {set i 0} {$i < 1000} {incr i} {puts $s $p}}] ; close $s
 3218768 microseconds per iteration
 % set mb [expr {81921000.0 / (1024 * 1024)}]
 78.1259536743
 % set sec [expr 3218768.0 / 1000000.0]
 3.218768
 % expr {78.1259536743/3.218768}
 24.2720052126

So, it looks like Tcl manages about 24MB/S on Linux (although, with very little else running).