Richard Suchenwirth 2004-05-15 - In this weekend fun project I wrote a toy sketch of a translation tool - not playing machine translation this time, however. You just have two scrollable text windows for the source and target text, and can manually translate from source to target, by typing into the target window.
The idea was to make this process easier, by having words in the source window look like, and act as, "links", so clicking on one brings up a tk_popup menu with:
In this current version, only words that are in the dictionary are marked as links (so read that "one or more") - before, I had marked up all words. Not sure which way is better.
proc main argv { text .src -height 5 -wrap word -yscrollcommand ".srcy set" .src tag config link -foreground blue -underline 1 .src tag config seen -foreground magenta4 -underline 0 scrollbar .srcy -command {.src yview} text .trg -height 5 -wrap word -yscrollcommand ".trgy set" scrollbar .trgy -command {.trg yview} grid .src .srcy -sticky news grid .src -sticky news grid .trg .trgy -sticky ns grid .trg -sticky news grid rowconfig . {0 1} -weight 1 grid columnconfig . 0 -weight 1 markup .src .trg "This is an example - let's see how it goes... and whether newlines come through." }
markup splits the source string (which in a practical app might be read from a file) into "words" and "nonwords" (the regular expression may well have to be refined...) and inserts these into the source window, marking up the "words" as links, and assigning to each a unique tag that triggers the lookup.
proc markup {src trg text} { $src delete 1.0 end set n 0 foreach {- word nword} [regexp -inline -all {(\w+)(\W*)} $text] { if [info exists ::Dic([string tolower $word])] { set tags [list link t[incr n]] $src tag bind t$n <1> "lookup $src t$n $trg [list $word];break" } else {set tags ""} $src insert end $word $tags $src insert end $nword } }
lookup creates a pop-up menu and fills it with the translations provided by the dictionary. When one of those is selected, it is directly inserted into the target window. Also, the look of the link is changed to "seen" - magenta and not underlined, so you have a visual cue of what you've translated so far.
proc lookup {w tag trg word} { destroy .m menu .m -tearoff 0 .m add command -label $word -command [list $trg insert insert "$word "] .m add separator set n 0 foreach i [translations $word] { .m add command -label $i -command [list $trg insert insert "$i "] incr n } if $n {.m add separator} .m add command -label Add... -command [list add $word] tk_popup .m [winfo pointerx $w] [winfo pointery $w] eval $w tag add seen [$w tag ranges $tag] } # A non-complaining dictionary retriever: proc translations word { if [info exists ::Dic([string tolower $word])] { set ::Dic([string tolower $word]) } ;# else returns {} }
The add function creates a little dialog where you can enter a new translation, which is added to the dictionary when you hit <Return> or click OK. The source word is also editable, in case you want to change that.
proc add word { toplevel .t wm title .t "Add translation" grid [entry .t.src -relief flat -width 40] - - -sticky ew .t.src insert end $word grid [entry .t.trg -width 40] - - -sticky ew grid [button .t.ok -text OK -width 6 -command { lappend Dic([.t.src get]) [.t.trg get]; destroy .t }] \ [button .t.cancel -text Cancel -command {destroy .t}] focus .t.trg bind .t.trg <Return> {.t.ok invoke} bind .t <Escape> {.t.cancel invoke} }
For a serious tool, the dictionary would of course be read in from a text file, but in this simple toy it is just "hard-wired":
array set Dic { this dies example Beispiel see sehen how wie goes geht newlines Zeilenvorschübe } #-- ...and off we go... main $argv
A variation of this could be a visual spell-checker, where words not in a mono-lingual word list would be marked up, and the menu would offer alternatives based on string similarity with the word in question. Then you'd have only one text widget, and the popup action would be to replace the word with its correction (possibly in all places where it occurs). But the weekend is coming to an end, and Tcl/Tk is a bit like a hydra - implement one idea, and get two new ones... :)
LES This is a translation memory or CAT (computer-assisted translation) tool that you're up to. Something a lot more serious and useful than machine translation.
There are several commercial solutions for Windows that I mention in Computer-assisted translation. But I only know one application for Linux/Unix. It's called OmegaT and can be found here: [L1 ]. I don't set my hopes too high about it, though, because it's written in Java. It can barely handle accented characters, the poor thing. And it's ugly, and bloated etc. All the Java-related problems.
(Sentence inserted 2012) Well, as of 2012 one can say that OmegaT is a quite mature product. And... nice detail: on [L2 ] they encourage non-programmers, to try first steps with Tcl/Tk!!! (and Tcl/Tk scripts could be used with the OmegaT framework)
I would like to give it a stab with Tcl/Tk and I am absolutely sure that the result would be a lot better. But I would like a lot of things.
I am pretty sure I have seen another CAT program for Linux, not in Java, but can't remember what it was. I remember it was very crude.
gold I have some routines and modified word volcabulary for Chinese pinyin and Japanese Romanji in the Refrigerator_Pinyin_Poetry. From looking at the commercial Google and Altavista translator windows, the only particular TCL wrinkle is the commercial translators do not show the romanized characters (eg, as in pinyin and romanji).If someone could help me prepare better/longer pinyin dictionaries, the poetry generator could get a little further along. Another little wrinkle is that some of the Chinese volcabulary I used for the poetry is 7th century Tang dynasty. In some cases, the meaning or usage of the Chinese volcabulary has changed in 12 centuries. But translating the Tang poetry with TCl was still my ideal use, but the TCL routines could be adapted to modern languages. In fact, using the previous Suchenworth procedures (subroutines), the Refrigerator_Pinyin_Poetry is effectively a word to word translator and I have some (tricky) versions of the code that accept pasted in text (from the internet) and translate. I about typed my fingers to stubs preparing the pinyin & romanji dictionaries and the volcabulary dictionaries may be a little too long to store on a wiki page.
goldMay2014: Didn't change original code above, but suggest additions to markup proc. One can insert some placeholder unknowns into a translation. For example, "This nnn nnn example - let us babies? see how < that?> adjective??? <unk> works." The Sumerian transliterations mark some characters as unknown, unreadable, or broken clean off. Sometimes guesses or clarifiers outside the Sumerian text are featured in brackets or question marks. At least in wordy English with its repetitive phrases, even multiple unknowns seem to fall into place if one can find a similar text. Suggest additions to markup proc like $trg insert $word;$trg insert end " "; } else {set tags ""};......;$src insert end $nword;$trg insert end " n ";$trg insert end " <unk???> ";