Random Hangs on high cpu load

From time to time I face problems in several long running tcl programs on one of our Windows production Systems, which I'm unfortunally not able to track down.

The source codes are too complicated to post here...

All programs make heavy use of after Events and file events, e.g. to periodically update a flag file on disk or reading stdout from called programs, and are execing many external excutables, either in an endless Loop or triggerd by twapi filesystem Monitoring. If everything works fine, the programs run many days (or weeks) without Problems, calling ten-thousands of other programs without Problems....

What I see yesterday is this:

  • We started another program on the machine, which puts the cpu under heay load and constantly allocates more and more Memory, so the machine slows down
  • We killed that program
  • Although the machine restores to a normal load then, my 3 tcl programs on that machine did not respond any more - that is, the after events did not fire anymore. There are no errors generated; it simply Looks like the programs are hang.
  • Other programs on that machine continue to run normally, so there was no general Windows error condition etc.

As I'm only experienced at the Tcl/Tk-script Level, I don't know how to track down such errors down any further. They aren't reproducable and happen from time to time. The machines are Windows VMs.

My questions are:

  • Under what circumstances is it theoretically possible that the tcl eventloop stops working?
  • Would it help to save the (few) Infos from Sysinternal's Process Explorer Output about active threads, thread state etc.? I'm not able to Interpret such Infos, I fear...
  • Is it possible that a call to exec ,,,,& /open |proc never Returns, blocking the whole program?

jdc Might be related to http://core.tcl.tk/tcl/tktview?name=8bd13f07bde6fb06

MHo Many thanks! Yes, this could it be.... I've searched the bugs already, but not found this entry.... So, I have to wait for 8.6.7 (didn't mentioned above that I'm using 8.6.6)...

jdc Maybe try applying the patch to your 8.6.6. and see if it helps?

MHo Hm, I'm using the tclkits from https://sourceforge.net/projects/twapi/files/Tcl%20binaries/Tclkits%20with%20TWAPI/ ; no source code there....

MHo 2018-05-17: Late, late addition: The 8.6.7 update solved the long-standing problem. Never saw the freezes again! Thank you all!!!

MHo 2019-05-16: Ported some programs (meanwhile updated to runtime 8.6.9) to Windows Server 2016; the problem is back again, at least a similar one :-( Programs which are servicing multiple |open'ed Processes hang from time to time.... or it seems that the eventloop in another one stops for say 10 minutes and afterwards resumes normal operation without intervention... very mysterious. The second program too reads the i/o from many "opened" programs in parallel via filevent....