[Richard Suchenwirth] 2004-05-15 - In this weekend fun project I wrote a toy sketch of a translation tool - not [playing machine translation] this time, however. You just have two scrollable [text] windows for the source and target text, and can manually translate from source to target, by typing into the target window. [WikiDbImage xlatool.jpg] The idea was to make this process easier, by having words in the source window look like, and act as, "links", so clicking on one brings up a [tk_popup] menu with: * the word itself (in case you want to leave it verbatim, as with proper names) * zero or more translation proposals from a "dictionary" * an "Add..." option to extend the dictionary. In this current version, only words that are in the dictionary are marked as links (so read that "one or more") - before, I had marked up all words. Not sure which way is better. ====== proc main argv { text .src -height 5 -wrap word -yscrollcommand ".srcy set" .src tag config link -foreground blue -underline 1 .src tag config seen -foreground magenta4 -underline 0 scrollbar .srcy -command {.src yview} text .trg -height 5 -wrap word -yscrollcommand ".trgy set" scrollbar .trgy -command {.trg yview} grid .src .srcy -sticky news grid .src -sticky news grid .trg .trgy -sticky ns grid .trg -sticky news grid rowconfig . {0 1} -weight 1 grid columnconfig . 0 -weight 1 markup .src .trg "This is an example - let's see how it goes... and whether newlines come through." } ====== ''markup'' splits the source string (which in a practical app might be read from a file) into "words" and "nonwords" (the regular expression may well have to be refined...) and inserts these into the source window, marking up the "words" as links, and assigning to each a unique tag that triggers the lookup. ====== proc markup {src trg text} { $src delete 1.0 end set n 0 foreach {- word nword} [regexp -inline -all {(\w+)(\W*)} $text] { if [info exists ::Dic([string tolower $word])] { set tags [list link t[incr n]] $src tag bind t$n <1> "lookup $src t$n $trg [list $word];break" } else {set tags ""} $src insert end $word $tags $src insert end $nword } } ====== ''lookup'' creates a pop-up menu and fills it with the translations provided by the dictionary. When one of those is selected, it is directly inserted into the target window. Also, the look of the link is changed to "seen" - magenta and not underlined, so you have a visual cue of what you've translated so far. ====== proc lookup {w tag trg word} { destroy .m menu .m -tearoff 0 .m add command -label $word -command [list $trg insert insert "$word "] .m add separator set n 0 foreach i [translations $word] { .m add command -label $i -command [list $trg insert insert "$i "] incr n } if $n {.m add separator} .m add command -label Add... -command [list add $word] tk_popup .m [winfo pointerx $w] [winfo pointery $w] eval $w tag add seen [$w tag ranges $tag] } # A non-complaining dictionary retriever: proc translations word { if [info exists ::Dic([string tolower $word])] { set ::Dic([string tolower $word]) } ;# else returns {} } ====== The ''add'' function creates a little dialog where you can enter a new translation, which is added to the dictionary when you hit or click OK. The source word is also editable, in case you want to change that. ====== proc add word { toplevel .t wm title .t "Add translation" grid [entry .t.src -relief flat -width 40] - - -sticky ew .t.src insert end $word grid [entry .t.trg -width 40] - - -sticky ew grid [button .t.ok -text OK -width 6 -command { lappend Dic([.t.src get]) [.t.trg get]; destroy .t }] \ [button .t.cancel -text Cancel -command {destroy .t}] focus .t.trg bind .t.trg {.t.ok invoke} bind .t {.t.cancel invoke} } ====== For a serious tool, the dictionary would of course be read in from a text file, but in this simple toy it is just "hard-wired": ====== array set Dic { this dies example Beispiel see sehen how wie goes geht newlines Zeilenvorschübe } #-- ...and off we go... main $argv ====== A variation of this could be a visual spell-checker, where words ''not'' in a mono-lingual word list would be marked up, and the menu would offer alternatives based on string [similarity] with the word in question. Then you'd have only one text widget, and the popup action would be to replace the word with its correction (possibly in all places where it occurs). But the weekend is coming to an end, and Tcl/Tk is a bit like a hydra - implement one idea, and get two new ones... :) ---- [LES] This is a translation memory or CAT ([computer-assisted translation]) tool that you're up to. Something a lot more serious and useful than machine translation. There are several commercial solutions for Windows that I mention in [Computer-assisted translation]. But I only know one application for Linux/Unix. It's called '''OmegaT''' and can be found here: [http://www.omegat.org]. I don't set my hopes too high about it, though, because it's written in [Java]. It can barely handle accented characters, the poor thing. And it's ugly, and bloated etc. All the Java-related problems. (Sentence inserted 2012) ''Well, as of 2012 one can say that OmegaT is a quite mature product. And... nice detail: on [http://www.omegat.org/en/involved.html] they encourage non-programmers, to try first steps with Tcl/Tk!!! (and Tcl/Tk scripts could be used with the OmegaT framework)'' I would like to give it a stab with Tcl/Tk and I am absolutely sure that the result would be a lot better. But I would like a lot of things. I am pretty sure I have seen another CAT program for Linux, not in Java, but can't remember what it was. I remember it was '''very''' crude. ---- [gold] I have some routines and modified word volcabulary for Chinese pinyin and Japanese Romanji in the [Refrigerator_Pinyin_Poetry]. From looking at the commercial Google and Altavista translator windows, the only particular TCL wrinkle is the commercial translators do not show the romanized characters (eg, as in pinyin and romanji).If someone could help me prepare better/longer pinyin dictionaries, the poetry generator could get a little further along. Another little wrinkle is that some of the Chinese volcabulary I used for the poetry is 7th century Tang dynasty. In some cases, the meaning or usage of the Chinese volcabulary has changed in 12 centuries. But translating the Tang poetry with TCl was still my ideal use, but the TCL routines could be adapted to modern languages. In fact, using the previous Suchenworth procedures (subroutines), the [Refrigerator_Pinyin_Poetry] is effectively a word to word translator and I have some (tricky) versions of the code that accept pasted in text (from the internet) and translate. I about typed my fingers to stubs preparing the pinyin & romanji dictionaries and the volcabulary dictionaries may be a little too long to store on a wiki page. <> Arts and crafts of Tcl-Tk programming