Random Hangs on high cpu load

Difference between version 6 and 7 - Previous - Next
From time to time I face problems in several long running tcl programs on one of our Windows production Systems, which I'm unfortunally not able to track down.

The source codes are too complicated to post here...

All programs make heavy use of [after] Events and [file event]s, e.g. to periodically update a flag file on disk or reading stdout from called programs, and are [exec]ing many external excutables, either in an endless Loop or triggerd by [twapi] filesystem Monitoring. If everything works fine, the programs run many days (or weeks) without Problems, calling ten-thousands of other programs without Problems....

What I see yesterday is this:

   * We started another program on the machine, which puts the cpu under heay load and constantly allocates more and more Memory, so the machine slows down
   * We killed that program
   * Although the machine restores to a normal load then, my 3 tcl programs on that machine did not respond any more - that is, the after events did not fire anymore. There are no errors generated; it simply Looks like the programs are hang.
   * Other programs on that machine continue to run normally, so there was no general Windows error condition etc.

As I'm only experienced at the Tcl/Tk-script Level, I don't know how to track down such errors down any further. They aren't reproducable and happen from time to time. The machines are Windows VMs.

My questions are:

   * Under what circumstances is it theoretically possible that the tcl eventloop stops working?
   * Would it help to save the (few) Infos from Sysinternal's Process Explorer Output about active threads, thread state etc.? I'm not able to Interpret such Infos, I fear...
   * Is it possible that a call to [exec] ,,,,& /[open] |proc never Returns, blocking the whole program?

[jdc] Might be related to http://core.tcl.tk/tcl/tktview?name=8bd13f07bde6fb06

[MHo] Many thanks! Yes, this could it be.... I've searched the bugs already, but not found this entry.... So, I have to wait for 8.6.7 (didn't mentioned above that I'm using 8.6.6)...

[jdc] Maybe try applying the patch to your 8.6.6. and see if it helps?

[MHo] Hm, I'm using the tclkits from https://sourceforge.net/projects/twapi/files/Tcl%20binaries/Tclkits%20with%20TWAPI/ ; no source code there....

[MHo] 2018-05-17: Late, late addition: The 8.6.7 update solved the long-standing problem. Never saw the freezes again! Thank you all!!!
[MHo] 2019-05-16: Ported some programs (meanwhile updated to runtime 8.6.9) to Windows Server 2016; the problem is back again, at least a similar one :-( Programs which are servicing multiple |open'ed Processes hang from time to time.... or it seems that the eventloop in another one stops for say 10 minutes and afterwards resumes normal operation without intervention... very mysterious. The second program too reads the i/o from many "opened" programs in parallel via filevent....