Version 36 of Update considered harmful

Updated 2016-02-23 01:26:40 by pooryorick

KBK writes in comp.lang.tcl (12 February 2001, slightly edited by the author) [but was KBK responsible for the title ?]:


Virtually any update is ill-considered!

The issue is that update has virtually unconstrained side effects. Code that calls update doesn't know what any random event handler might have done to its data while the update was in progress. This fact is the source of insidious bugs.

The problem is pretty pervasive, once an application reaches a certain level of complexity. One example that I had was:

  • Messages arrive on sockets from various places. Certain messages describe urgent conditions that caused a dialog to be posted. The code that positions the dialog on the screen uses update.
  • Messages arriving on the sockets also allow for urgent conditions to be dismissed (say, because a piece of equipment starts responding again).

Now consider the following sequence of events.

  • A sender detects an urgent condition, and then immediately detects its resolution. The two messages (creating and dismissing the dialog) wind up back-to-back on a socket.
  • The 'readable' handler for the socket is entered, and reads the first message. As part of processing the message, a dialog is posted, and the code that posts it enters update.
  • The event loop is now free to process events, and re-enters the 'readable' handler for the socket. Now the message that dismisses the dialog gets processed and destroys the window.
  • The update eventually returns and tries to do the winfo width and winfo height to begin its geometry calculation. But the window no longer exists -- step 3 destroyed it -- and so the geometry calculation throws an error.

Mind you, this is an easy example. Imagine trying to track down the problem if you have custom C code making callbacks with Tcl_Eval and its friends. One of the callback scripts does an update, and the event handler winds up deleting or making radical modifications to data structures being maintained in the C code. On return from the update, you get a pointer smash with absolutely nothing to go on.

Yes, you can avoid these problems. You can play games with the event loop (unbinding events that could cause trouble). You can re-check the world after an update. You can twiddle bits in the event mask. You can do smart things with Tcl_Preserve. You can have your file-events trigger idle callbacks so that you know that all data have been collected from the socket before the callbacks fire. (And then you get the shaft from an unexpected update idletasks!) I have better things to do with my time!

It's usually much cleaner and easier to debug if you structure the code with chains of event bindings. For instance, you can use a <Configure> binding to trigger positioning a window (as in Centering a window). If you do, there's no update confusing things; the process simply returns to the event loop. If a subsequent event deletes the window before it is configured, no problem, the binding goes away and the <Configure> handler never fires. You can think of the window as a state machine, and each event as a state transition.

The use of update to Keep a GUI alive during a long calculation can also be avoided (a simpler development of the principle appears in Countdown program). Moreover, except for the very simplest applications, the resulting code is cleaner and easier to integrate and maintain (although slightly more verbose).

I'd go as far as to say that I've never seen code where update, a single-argument after, vwait or tkwait is really needed, except for a vwait that initially launches the event loop. And I've seen lots of timing issues like the one I described above. Personally, were it not for the code it would break [AMG adds: the code is broken to begin with], I wouldn't cry if update were removed from the language altogether.

NEM 2010-07-28: I would add some notes to this. Firstly, it is not always obvious that you are calling one of these difficult commands. For example, tk_messageBox calls vwait as part of its workings. I have been caught by this before, in a similar manner to KBK (popping up dialog boxes in response to socket messages), and it can be hellish to debug. Secondly, be aware that coroutines can also lead to these kinds of problems if not controlled: a yield to the event loop is equivalent to a non-nesting vwait. Concurrency is a cruel mistress.

EG I learn not to call any code that shows a modal dialog inside an event handler, because it will not terminate until you dismiss the dialog. When I need to do some user interaction, for example choosing a filename to save partial results, I usually use after 0 mySaveProc. This allows the current event handler to finish, and then the event loop will call the save procedure.


AMG: The problem documented on this page is real, but it's not specific to update, nor to the Tcl event loop. It's more fundamental than that. Rather than unfairly tar update and cause the true problem to go unexplored, I will attempt to get to the bottom of it.

A critical section is a code segment that assumes it has exclusive access to a shared resource. The trouble happens when this assumption is violated.

The best-known culprit is multithreading, in which case the fix is to correctly advertise to the scheduler which segments of code cannot overlap. From when a critical section starts until it finishes, the scheduler must not start another critical section that stomps on any of the same resources as the first. Of course, the scheduler can only do this properly when the code obsessively complies with the resource locking regime. This is a problem on both single- and multi-core systems; it doesn't matter if the contentious critical sections take turns or run in parallel.

Threads aren't necessary for this problem, since all that's needed is for two critical sections to overlap, so that they stomp on each other. Without threads, it's still quite possible for one critical section to (accidentally) invoke another. This can happen by calling update or vwait or tkwait in the middle of a critical section, if there's another event handler that also contains a critical section that collides with the first. But this can also happen by calling yield to return to the event loop, in the same circumstance.

Many Tk commands use vwait and similar. Have a look at the implementation of tk_dialog [L1 ]; it uses vwait, tkwait, and update idletasks. In particular, notice the catch surrounding the bind command at the end; this safeguards against the possibility that vwait called an event handler that deleted the window.

The trick is to break up critical sections such that they never span an update or similar. This serves the same purpose as locking in the multithread case; while the code is running, everything's effectively locked, then when it enters the event loop, everything's effectively unlocked.

One way to do this is to revalidate shared resources after returning from update. Performing this validation means that the code no longer assumes it has exclusive access, therefore the critical section has ended. (The defining characteristic of a critical section is that it assumes exclusive access.) This is the approach taken by tk_dialog.

Another way is to not call update in the middle of straight-line code but to instead schedule the continuation as an event handler. Most types of event handlers are automatically canceled when their resource goes away, e.g. when a channel is closed, all its chan event go away too. Obviously, after idle and after $time handlers don't get automatically deleted, since they don't have an associated resource, so you will need to explicitly delete them in any code that invalidates the critical resource.


AMG: There is another problem with update unrelated to the unlimited side effect issue described above. Nested invocations of update, vwait, etc. can potentially block the parent code from continuing. Nested invocations can easily happen by accident, simply by calling update, etc. within an event handler. Here's a simple example:

proc a {} {
    puts "a: waiting half a second"
    after 500 {set a 1}
    puts "a: [time {vwait a}]"
}
proc b {} {
    puts "b: waiting five seconds"
    after 5000 {set b 1}
    puts "b: [time {vwait b}]"
}
proc test {} {
    after 0 a
    after 0 b
    update
}
test

On my computer, this prints:

a: waiting half a second
b: waiting five seconds
b: 5000277 microseconds per iteration
a: 5001788 microseconds per iteration

Things happen in this order:

  • vwait a enters the event loop and waits for $a to be modified
  • vwait b enters the event loop again and waits for $b to be modified
  • $a is modified
  • $b is modified
  • vwait b returns
  • vwait a returns

This problem is fixed by not recursively entering the event loop. One alternative is continuation passing, the other is coroutines.

With the continuation passing technique, a proc enters the event loop by returning to the top level, since the event loop is at the top of the stack. For example, return -level [info level]. Before returning to the event loop, the proc must schedule itself to be resumed. This means storing all its state--- both variables and execution position a.k.a. continuation--- somewhere that it can get at them later. Global variables work, as do arguments embedded in the scheduled event handler. Also consider TclOO object member variables. This has to be done not only with the proc but with all procs that call it; obviously, you'll want to avoid having this happen deep in the stack.

With coroutines, the shiny new NRE does all this work for you. All you have to do is run your proc in an alternate stack created by the coroutine command, then call yield to return to the event loop. Of course, you still need to schedule your code to be resumed, but all the continuation information is saved without any effort on your part. Simply schedule for the return value of info coroutine to be called. One restriction to mind is that not all commands are NRE-enabled. You can use these commands if you wish, but you can't call yield inside a proc that is invoked by a non-NRE command. Non-NRE commands are mostly found in extensions.

It would be nice if someone created and maintained a list of non-NRE core commands...

Both these techniques are discussed in Keep a GUI alive during a long calculation. Also see Firework Display.

Here's an example of continuation passing. It's simple in this case, but it can get quite hairy. The only tricky part is measuring time. The continuation is formatted as a step number followed by a key-value list of extra state variables, and the continuation is stored in the event queue.

proc a {{step 0} args} {
    dict with args {}
    switch $step {
    0 {
        puts "a: waiting half a second"
        after 500 [list a 1 start [clock microseconds]]
    } 1 {
        puts "a: [expr {[clock microseconds] - $start}] microseconds"
    }}
}
proc b {{step 0} args} {
    dict with args {}
    switch $step {
    0 {
        puts "b: waiting five seconds"
        after 5000 [list b 1 start [clock microseconds]]
    } 1 {
        puts "b: [expr {[clock microseconds] - $start}] microseconds"
    }}
}
proc test {} {
    after 0 a
    after 0 b
    vwait forever
}
test

Result:

a: waiting half a second
b: waiting five seconds
a: 500765 microseconds
b: 5000548 microseconds

a and b are now interleaved properly.

Here's an example using yield. It would be nearly identical to the vwait example if not for the fact that time is (currently) not NRE-enabled. I know this because I got the error "cannot yield: C stack busy" when I tried yielding inside time.

proc a {} {
    puts "a: waiting half a second"
    after 500 [list [info coroutine]]
    set start [clock microseconds]
    yield
    puts "a: [expr {[clock microseconds] - $start}] microseconds"
}
proc b {} {
    puts "b: waiting five seconds"
    after 5000 [list [info coroutine]]
    set start [clock microseconds]
    yield
    puts "b: [expr {[clock microseconds] - $start}] microseconds"
}
proc test {} {
    after 0 {coroutine coro1 a}
    after 0 {coroutine coro2 b}
    vwait forever
}
test

Result:

a: waiting half a second
b: waiting five seconds
a: 497929 microseconds
b: 5000187 microseconds

Again, a and b are correctly interleaved.


AJJ - 2014-09-25 00:26:17

I understand the issues here, but I have a problem I have not been able to figure out without "update". When I create a toplevel and fill it with widgets, I can't get the geometry until after an update:

proc maketop {args} {
  catch {destroy .t}
  toplevel .t
  button .t.b -text PUSHME
  pack .t.b
  eval $args
  puts [winfo geometry .t]
  puts [wm geometry .t]
}

% maketop
1x1+0+0
1x1+0+0
% maketop update idletasks
200x200+0+0
200x200+0+0
% maketop update
132x26+166+190
132x26+166+190

I'm trying to create toplevels, withdraw them while adding widgets, possibly set a specific geometry, then deiconify, so window is more neatly displayed. But doesn't work without update. An interesting thing I found also was:

% maketop update idletasks; puts [wm geometry .t]
76x26+0+0
76x26+0+0
76x26+0+0
% maketop update idletasks; after 0 {puts [wm geometry .t]}
76x26+0+0
76x26+0+0
after#8
% 132x26+193+217

Can someone explain this behavio to me and suggest a better way?

RLE (2014-09-24): You are seeing documented behavior:

man n update:

  ... However, there are some kinds of updates that only happen in 
  response to events, such as those triggered  by window size
  changes; these updates will not occur in update idletasks. ...

You are looking for things "triggered by window size changes", which is why a simple update idletasks is insufficient.

If you do want to do something in response to window size changes, binding to the window's <<Configure>> event is a "non update" method to do things.

APN It has been my experience that the circumstances where update or update idletasks is required depends on the widgets as well as the platform. What RLE suggested is I think the recommended way to do things but I've found it can cause flashes as the window is drawn if it is not a simple window. One alternative Joe English suggested was to directly send the window a <Configure> event instead to have it update its geometry. I've found this works with some, but not all widgets (at least on Windows). A workaround I've used in the past is to draw the window off screen, say at +20000+20000 and then move it back when done. That way the flashing is not visible.

SeS Preventing & counteracting flashing of windows/forms...it seems I am not the only one coping with this. Using the offset to push the toplevel-under-construction temporary off screen is the most simple method and works neatly, I used it extensively in tG² in the past years, as well as the scripts being generated for creating the GUIs. As a side note: I should use bigger offsets than +5000+5000, screen-resolutions are getting bigger and bigger... Also: when -alpha option is used for a toplevel and it is set to 1.0 and further alpha-operations are visibly performed, I noticed also flashing. I read somewhere in wiki long time ago (sorry, dont know where exactly) that setting it to 0.99 helps to get rid of that problem too.


AJJ - 2014-09-26 04:51:40

I tried sending a <Configure> event but it doesn't appear to do anything regardless of whether the window is withdrawn or not. I'm not sure this is what you meant, but I do see the binding getting called. I see my requested geometry only after the update idletasks, but the geometry including the SM borders still doesn't show without an update. This works the same if I put "event generate" commands throughout. I'm running on Linux BTW, using an Xming display on Windows. But this has been my experience regardless of the window manager I'm using.

proc makewin {win} {
  destroy $win
  toplevel $win
  wm withdraw $win
  bind $win <Configure> {+puts configGeo=%wx%h+%x+%y}
  wm geometry $win =300x300+250+250
  button $win.b -text TEST
  pack $win.b
  update idletasks
  puts geo1=[wm geometry $win]
  event generate $win <Configure>
  puts geo2=[wm geometry $win]
  update idletasks
  puts geo3=[wm geometry $win]
  wm deiconify $win
  puts geo4=[wm geometry $win]
  update idletasks
  puts geo5=[wm geometry $win]
  update
  puts geo6=[wm geometry $win]
}
% makewin .t
configGeo=1x1+0+0
geo1=1x1+250+250
configGeo=0x0+0+0
geo2=1x1+250+250
geo3=1x1+250+250
configGeo=300x300+250+250
geo4=300x300+250+250
configGeo=59x28+120+1
geo5=300x300+250+250
configGeo=300x300+250+250
configGeo=300x300+254+278
geo6=300x300+254+278

RLE (2014-09-26): Configure events fire when the event loop runs. The event loop does not run while code in a proc is executing (unless you call update, which asks for the event loop to be run). The reason why your code above did not work is you never exited the proc to actually allow the event loop to run.

Reformat your code this way:

proc cfg {win w h x y} {
  puts stderr "cfg: win $win w $w h $h x $x y $y"
  puts stderr "cfg: $win geometry [ winfo geometry $win ]"
}

proc makewin {win} {
  destroy $win
  toplevel $win
#  wm withdraw $win
  bind $win <Configure> [ list cfg %W %w %h %x %y ]
  wm geometry $win =300x300+250+250
  button $win.b -text TEST
  pack $win.b
}

makewin .t

and when you run it like so: wish test-example.tcl

you get this:

cfg: win .t w 300 h 300 x 250 y 250
cfg: .t geometry 300x300+250+250
cfg: win .t w 300 h 300 x 258 y 284
cfg: .t geometry 300x300+258+284
cfg: win .t w 300 h 300 x 258 y 284
cfg: .t geometry 300x300+258+284
cfg: win .t.b w 58 h 28 x 121 y 0
cfg: .t.b geometry 58x28+121+0

And both the % escapes, as well as winfo geometry show the correct, assigned, sizes.

The wm withdraw is commented out because withdrawn windows do not calculate their geometries. The geometry calculation only happens when the window is mapped, and withdrawn means not mapped.

APN To further clarify (or confuse!) if you remove the wm withdraw, for more complex windows, the flash while the window rearranges itself is visible to the user. The suggestion to move the window offscreen was to get around this. So essentially, create the window, move it off screen, populate the widgets, then return to the event loop (by rescheduling code using after, not update). Then get the geometry if you want (and of course move the window back to the visible screen). If there is a simpler way that always works across all widgets and platforms, I don't know it. But then again, I'm no Tk expert.


AJJ - 2014-09-27 05:25:44

Ah, didn't understand when the event loop ran. But your example actually illustrated my real problem. I'm trying to get the geometry of a window after it's created, and also when it's moved, save the geometry to a file, and recreate the window at the last known location each time.

Moving the window off screen when it's drawn instead of withdraw/deiconify is much nicer! But I still have the same problem with <Configure> trying to set the geometry:

proc makewin {win geo} {
  catch {destroy .t}
  toplevel .t
  wm geometry .t $geo
  bind .t <Configure> {set GEO "+%x+%y"}
}
% makewin .t +50+50
% makewin .t $GEO
% makewin .t $GEO
% makewin .t $GEO
% wm withdraw .t; wm deiconify .t
% wm withdraw .t; wm deiconify .t
% wm withdraw .t; wm deiconify .t

In every case the window moves down and to the right, obviously because <Configure> reports the position of the toplevel, which gets shifted with the WM border is added. And it seems "deiconify" doesn't handle this well either. Any other suggestions?

APN What platform? On Windows 8 the above code works fine. The window gets created at exactly the same position every time.

RLE (2014-09-27): Likely Linux or maybe OSX. The coordinates reported by %x and %y in bindings tell you where, on the physical screen, the top left corner of the area that Tk manages appears. The geometry data tells the window manager where to place the top left corner of the window decoration. So the top left corner of the Tk area is offset down and to the right by the size of the added window manager decorations.

Change your script to this, and it will stay put on Linux (untested on OSX):

proc makewin {win geo} {
  catch {destroy .t}
  toplevel .t
  wm geometry .t $geo
  bind .t <Configure> {set GEO [ wm geometry %W ]}
}
% makewin .t +50+50
% makewin .t $GEO
% makewin .t $GEO
% makewin .t $GEO

Further Reading

Incorrect result from a UNION with an ORDER BY , 2016-02-09
A SQLite bug in which concurrent routines accessed and modified shared memory, stepping on each other's toes and corrupting query results in the process.