PocketPC socket/fileevent strangeness

MNO - Latest News:- The problem described here goes away for me when using the eTcl distribution. Similarly, another problem I was seeing in previous Tcl distributions for PocketPC (the "tcl: select: 10022" error dialog) also doesn't occur in eTcl.


Older News:- I have now narrowed the problem down further - it appears that the 8.4a2 port by Reiner Keuchel does not have this problem. fileevent appears to work properly in this version. The bug only seems to appear in the 8.4.3 port. (Of course, having the fileevent part of my application working has lead to me finding a number of other major bugs/annoyances which will need to be fixed before release ;-)


Original Discussion and Comments:-

MNO - Whilst knocking up a PocketPC application today PocketICS, I have noticed some weirdness with fileevent.

I can minimally demonstrate this with the following code snippet (adapted from the telnet page), which tries to connect to an internet chess server:-

 #! /bin/sh
 # THIS LINE MUST BE HERE \
     exec tclsh "$0" ${1+"$@"}
 package require Tk
 
 proc telnet {{server localhost} {port 23}} {
     global sock
     set sock [socket $server $port]
     fconfigure $sock -buffering none -blocking 0
     fileevent $sock readable [list fromServer $sock]
     global closed
     vwait closed($sock)
     unset closed($sock)
 }
 proc toServer {} {
     global comm sock
     puts $sock $comm
     set comm ""
     update
 }
 proc fromServer {sock} {
     set data x
     while {[string length $data]} {
         set data [read $sock 4096]
         if {[eof $sock]} {
             disconnect $sock
             return
         }
         if {[string length $data]} {
             .t insert end $data
             .t see end
         }
     }
 }
 proc disconnect {sock} {
     global closed
     close $sock
     set closed($sock) 1
 }
 
 pack [text .t -font {Tahoma 7} -height 20 -width 40] [entry .e -textvariable comm]
 bind .e <Return> toServer
 telnet chessclub.com 5000

Running this piece of code on either a Linux system (Tcl/Tk version 8.4b2) or Windows 2000 system (ActiveTcl 8.4.0) works exactly as expected - I get to the login prompt, can enter my handle in the entry box and this gets sent to the server which then prompts me for the password.

If, however, I run this on my PocketPC, (using the 8.4.3 tcl/tk on the Windows/CE page), I get the login prompt (i.e., proving that the fileevent has fired), and can send my handle, but I never see the password prompt in the text box (i.e., the fileevent doesn't seem to fire again).

The strange thing is that if I snoop the connection (my PocketPC internet connection is via my Linux machine) using tcpdump, I can see the handle being sent, and the response including the password prompt being sent back by the server. I can then even enter the password in the entry box, and see it go to the server and the login session continues fine. The fileevent never seems to fire again though, so all that I ever seem to get in my text box is the first set of output up to and including the login prompt.

Any ideas? Is this a bug in the Windows/CE Tcl/Tk version?

Could this be related to Tcl bug #719790 (fcopy -command hangs in Win NT4)? It looks like that bug only affects Tclkit, not tclsh - but somehow it sounds similar... -jcw

MNO - Could be. I have modified my code to keep count of cumulative bytes received just to see if the magic value of 4096 appears anywhere. I think not though - I'm pretty sure that there can't be more than about 500 bytes received before it stops triggering fileevent. (Actually, IIRC, I only ever see the first 119 bytes via fileevent).

escargo - Could you use sockspy to get another view of what is happening? (I don't know if it would be more useful than tcpdump.)

MNO - I don't think running sockspy on the linux node would help much since, as you note, I've got tcpdump running there. I wondered about running it on the PocketPC end but, of course, it is pure Tcl and relies on fileevent itself. I may try it there anyway tonight, in case it behaves differently, but if there is a bug then sockspy is probably going to be affected too... There looks to be one difference in the fconfigure behaviour of sockspy (it sets -translation binary which i don't). I don't know if this will make a difference or not. i did try the above code with -encoding binary set, but this didn't seem to change the behaviour any.

MNO - I have performed a test within tkcon on the PocketPC just using socket, fconfigure, read and puts (i.e. no fileevent) and this seems to work fine - I can conduct a session via the following type of exchange:

 % set s [socket chessclub.com 5000]
 % fconfigure $s -buffering none -blocking 0
 % read $s
 "
 .
 (server output)
 .
 login: "
 % puts $s myname
 % read $s
 " (more server output)
 password: "
 % puts $s mypassword
 % read $s
 " (more server output including welcom message and command prompt
 aics% "
 % puts $s icscommand
 % read $s
 "(output of icscommand)"
 %
 etc. etc.

Which does seem to point at the problem being in fileevent related bits of the channel handling code.

Another hunch... could it be related to the combo of sockets and encodings? -jcw

MNO Dunno - I did try the "mini-telnet" code at the top of the page also with a "-encoding binary" option on the fconfigure and it behaved the same.