iDict: a multilingual dictionary

if 0 {Richard Suchenwirth 2003-04-29 - Dictionaries are a popular application on PocketPcs. Here is my first take in Tcl, which allows one to consult multiple dictionaries (languages) in one query - fun for linguists.

WikiDbImage idict.jpg

Dictionaries are simple text files, one line per entry, where source and target term are separated by an equal sign, e.g.

 dog=Hund

Names of dictionaries consist of the ISO codes for source and target language, separated by a dash, with .txt extension, e.g.

 en-de.txt

For convenience I also assert that all dictionaries are in the same directory as iDict.tcl itself. Un-ASCII characters can be written in \u.. notation. Lookup is done for both sides of the "equation", which avoids duplication, but demands a single entry (word or phrase) on both sides.

To look a word up, type it (or a prefix) into the entry, hit <Return> or the Go button. An empty search string (prefix) dumps all contents of all selected dictionaries, ordered by language and Unicode. The search string goes into a regular expression, so you can search for suffixes too, e.g. .+dom finds boredom, kingdom, etc.

Tapping on an entry sets the From combo to its language, and the search term to its string. This way you can easily do "transitive searches", e.g. de Hund -> en dog -> fr chien. I use English as pivot language. }

 namespace eval iDict {
    variable version 0.1
 }
 proc iDict::languages {} {
    cd [file dirname [info script]]
    foreach file [glob *-*.txt] {
       regexp {(.+)-(.+)\.txt} $file -> from to
       set a($from) ""; set a($to) ""
    }
    lsort [array names a]
 }
 set iDict::languages [iDict::languages]
 proc iDict::translate {word {from *} {to *}} {
    set res {}
    set sep \[=\t\]
    set files [lsort -uniq [glob -n $from-$to.txt $to-$from.txt]]
    foreach fn $files {
       regexp {(.+)-(.+)\.txt} $fn -> from to
       set fp [open $fn]
       while {[gets $fp line]>=0} {
          set line [subst -nocom $line]
          if [regexp -nocase ^($word.*)$sep\(.+) $line -> fw tw] {
             lappend res [list $from $fw $to $tw]
          }
          if [regexp -nocase (.+)$sep\($word.*) $line -> tw fw] {
             lappend res [list $to $fw $from $tw]
          }
       }
       close $fp
    }
    lsort $res
 }
 proc iDict::ui {} {
    package require BWidget
    variable languages
    set values [concat * $languages]
    frame .f
    ComboBox .f.1 -textvariable from -width 3 -values $values -editable 0
    set ::from *
    label .f.2     -text ->
    ComboBox .f.3 -textvariable to -width 3 -values $values
    set ::to *
    button .f.4 -text Go -command {iDict::do $word .2.t}
    button .f.c -text C -command {set word ""; focus .w}
    eval pack [winfo children .f] -side left
    entry .w -textvar word
    bind .w <Return> {iDict::do $word .2.t}
    frame .2
    scrollbar .2.y -command {.2.t yview}
    pack .2.y -side right -fill y
    text .2.t -yscrollcommand {.2.y set} -font {{Bitstream Cyberbit} 10}
    foreach color {red blue} {
       .2.t tag config $color -foreground $color
    }
    .2.t tag config red -font {Tahoma 7}
    bind .2.t <ButtonRelease-1> {
      if [regexp {(..) (.+)} [%W get {current linestart} {current lineend}] -> lg wd] {
         set from $lg; set word $wd
      }
      focus .w
      break
    }
    pack .2.t -side right -fill both -expand 1
    pack .f .w .2 -fill x
    focus .w
 }
 proc iDict::do {word w} {
    global from to
    set cfl ""; set ctw ""
    $w delete 1.0 end
    foreach item [translate $word $from $to] {
       foreach {fl fw tl tw} $item break
       if {$fl !=$cfl || $fw != $cfw} {
          set cfl $fl; set cfw $fw
          $w insert end $fl red " " {} $fw\n blue
       }
       $w insert end \t$tl red " " {} $tw\n
    }
 }
 iDict::ui
 wm geometry . 240x188+0+0
 .2.t insert end "iDict $iDict::version
    Richard Suchenwirth 2003
    Languages: $iDict::languages"

 bind . <Up> {exec wish $argv0 &; exit}

IDP dictionaries: The Internet Dictionary Project [L1 ] http://www.aracnet.com/~tyler/IDP/files/ offers a number of plain text dictionary files. By allowing a tab (\t) as alias for an equal sign, the code above can also directly process files from that site - just make sure to rename the files to the en-de.txt pattern.

unperson Well, aracnet seems to be out... The link is dead...


A starkit version of this code is available on sdarchive.


unperson Very very interesting, Richard... Like most of what you do... Tell me: is using this pen to type stuff practical? Not as fast as a keyboard though I believe... RS: Well, it takes some getting used to, and I wouldn't want to do large-volume data entry with the pen. But it's fully feasible to edit text files, even in Arab, Chinese, Hebrew, or Korean, with e.g. iKu.


Arts and crafts of Tcl-Tk programming | Category Application | Category Language