Word Reaper

if 0 {Richard Suchenwirth 2003-06-03 - The following proc was written to extract the text content from a (Microsoft) Word document, via dde and the clipboard. It's still not pretty or very stable, but after hours of wrestling with fuzzy documentation, I now wikify it as a first shot.

The procedure is called with the filename of a Word document, and returns the reaped text, after some while (I could not think of other synchronization rather than to wait much...) A known issue is that some characters (apostrophe, "...") get specially encoded, and the output contains a question mark in these positions. Tk is needed so we can use the clipboard.

Hints on how to do this better are greatly appreciated! }

 proc doc2txt fn {
    package require dde
    package require Tk; # because we need [selection]
    eval exec [auto_execok start] [list $fn] &

    #Loop to wait until Word is really there, and ready to talk
    set word ""
    while {$word==""} {
        set word [dde services Winword System]
        after 200
    }
    after 1000 ;# wait for the window to load...
    dde execute Winword System {[EditSelectAll]}
    dde execute Winword System {[EditCopy]}
    set res [selection get -selection CLIPBOARD]
    dde execute Winword System {[FileExit 2]}
    set res
 }

LES: I can't do it with DDE. DDE is so crude. But COM and optcl can produce a very good result. First, create and save this Word macro in your Normal.dot:

 Sub wordreaper()

 Dim myRange As Range

 With Word.Application
     If .Windows.Count > 0 Then
         Set myRange = ActiveDocument.Content
     End If
 End With

 Open "c:\windows\desktop\extract.txt" For Append As #1
 Print #1, myRange
 Close #1

 End Sub

Then, run this Tcl code:

 set  _docpath  {c:\windows\desktop\some.doc}

 package require optcl
 set  ::hWORD  [ optcl::new  word.application ]
 set  ::hDOC  [ $::hWORD  -with documents  open  $_docpath ]
 $::hWORD  run  wordreaper
 $::hDOC Close
 $::hWORD Quit

The macro in Normal.dot is often necessary because "translating" the entire macro into some way that optcl can send entirely on its own is very difficult, if possible at all. I also have no idea of how one could pass the path to "extract.txt" to the macro instead of hard-coding it.


Arts and crafts of Tcl-Tk programming