Techniques for 'driving' Windows applications

[ testing, automation, ...]

[Technologies: few of these have a Tcl connection, and several operate cross-platform. Still, it's convenient to aggregate them all here, because, at least, they can be useful to Tcl.

  • ActiveX
  • Android (when combined with VNC or rdesktop[L1 ])
  • AutoIt [L2 ]
  • AutoPy has
  • AutoHotkey [L3 ] (the former PyDroid) has fans.
  • COM
  • cwind - find windows, inject keystrokes (free)
  • DDE
  • Eagle, Garuda
  • Eventcorder [L4 ]
  • Expect for Windows
  • GUI4Cli
  • Macro Scheduler [L5 ]
  • Perl's Win32::GuiTest [L6 ]
  • PowerPro
  • "T-Plan Robot (formerly known as VNCRobot) ..." [L7 ] claims to "automate ... all major systems, such as Windows, Linux, Unix, Solaris and certain mobile platforms."
  • TextCatch [L8 ] exposes COM services
  • TWAPI includes COM, window management and input injection (mouse and keyboard)
  • Python-coded and -extensible open-source Pamie [L9 ] (very specialized, but useful) and the widely-appreciated Watsup [L10 ]
  • WSH
  • winbatch
  • Win32-GuiText-X
  • Python-coded winGuiAuto [L11 ]
  • (other) commercial testing applications,
  • Girder [L12 ] (including Python client [L13 ])
  • win32api's PostMessage provides for delivery of keystrokes, button presses, and such, to external processes; while this [L14 ] discussion, as well as Simon Brunning's commendably detailed "Driving Win32GUIs with Python" [L15 ], are about Python coding, the same functionality is available to Tcl through TWAPI
  • wintclsend - similar to cwind, includes mouse moves (license)
  • Record and replay system for Tcl/Tk
  • ...]

Forget Cwind, WinTclSend And The Like...

Cwind, WinTclSend and Perl's Win32::GuiTest, are programs that rely primarily on three standard Windows API functions:-

  • Find Window - which given a window's title (the text string that goes in the Title Bar,) will return its HWND (a unique identifier that Windows assigns to a window).
  • Set Foreground Window - which makes the HWND supplied it the foreground window - to which all subsequent key-strokes and mouse-clicks are sent, and;
  • Send Keys - which sends the specified keystrokes to the current foreground window.

The documentation for these programs says or implies that you can control applications like Notepad, Word, Excel, etc - from your Tcl script - just as a user could - by using the package functions to simulate user keyboard/mouse input. This technique is hopelessly inadequate and unreliable - except in certain very restricted situations. There are three problems:-

  1. The Disappearing Foreground Window. The foreground window is a global thing - shared by all processes and applications that are running. So although you can SetForegroundWindow to the application you want to send key-strokes and/or mouse-clicks too - Windows and/or any other application can do the same thing too. In particular, every time the user clicks on something, that something becomes the new foreground window. Quite obviously, if the user's trying to use their web browser (or whatever), at the same time as your script is trying to send keystrokes to (say) Excel (or whatever), it's going to be a complete disaster. And even if no user is using the computer while your script is running, that's still no guarantee your key-strokes will reach their intended destination. Other processes can pop up message boxes etc, and steal the foreground window away from the window you set it to, at any time.
  2. No Feedback. In general, when a user clicks the mouse or types in keys, they get usually visual feedback that those key-strokes and mouse-clicks are being processed. The key-strokes are echoed into the input box, and windows open and close, etc. But your script can't see any of this. And has virtually no way of knowing whether or not the requested action has been taken.
  3. Time Delays. The Send Keys function will return immediately. But that doesn't mean the requested task has been completed. Say you ask Internet Explorer to open some URL - it could be 1 second or 10 seconds or 100 seconds or never - before the specified HTML is actually downloaded and displayed (assuming of course, that it is downloaded and displayed). Similarly, asking Notepad to print a file, or Excel to re-calculate a spreadsheet, could take similar wildly varying times. And just because it takes (say) 10 seconds on your lightly loaded 1 GHz machine, that doesn't mean it will take 10 seconds on some other user's 200 MHz machine on which they're simultaneously playing a CD - and a 3D shoot-em-up.

Taking the above factors into account, it should be obvious that you can't generally use the SetForegroundWindow/SendKeys technique to control any old Windows application. Possibly, if no-one is using the computer whilst your script is running, you might get some mileage out of it. But if you want to drive Excel, Notepad, Word, Internet Explorer, your MP3 player, or the like, whilst you or other users are using the computer, forget it. The other solutions mentioned above are much better alternatives. (Peter Newman - 29 Feb 2004.)

However, there are exceptions. If you're "testing" the application the script should be interacting with the program as the user would (correcting/catching for switched foreground windows, etc.). Also, in newer version of Win32::GuiTest, time delays aren't necessary as there are functions such as WaitWindow(). Also, when a script clicks on a button there is usually visual feedback; which the script can confirm using FindWindow, etc. or simply checking function return values.


davidw Monday morning humor regarding "driving" windows applications (feel free to erase it:-):

http://www.canadianheritage.org/images/large/10045.jpg