if 0 {[Richard Suchenwirth] 2003-10-26 - Machine translation, conversion
of text in one source language S into the equivalent text in another target
language T by computers, was one of the earliest dream applications -
much work was put in it, and after 50 years the translation quality is
still moderate to ridiculous (try Babelfish on a non-trivial text). 
In this weekend fun project, I want to demonstrate aspects of machine translation
very simply. Let S=German (my mother tongue), T=English (so you can
judge the outcome), and start with
 Dies ist ein einfacher Satz
The simplest approach is word-by-word translation. The "dictionary" is 
very simply structured and takes care of German inflection suffixes already 
(which is a bit wasteful, as it leads to n-plication of entries). 
It contains a few words more that will be needed in later examples:}
 array set de_en {
    anderen   other
    anderer   other
    Dies      this
    diesen    this
    ein       a
    einen     a
    einfacher simple
    Ich       I
    ist       is
    Satz      sentence
    übersetzen translate
    will      want
 }
if 0 {Our first "translator" requires the input string to be a well-formed
list, and leaves words not in the dictionary (e.g. proper names)
untranslated:}
 proc translate0 {string dictName} {
    upvar 1 $dictName dict
    set res {}
    foreach word $string {
        if [info exists dict($word)] {set word $dict($word)}
        lappend res $word
    }
    set res
 }
if 0 {
 % translate0 "Dies ist ein einfacher Satz" de_en
 this is a simple sentence
Perfect! But.. well, it was simple indeed. The next test comes out worse:
 % translate0 "Dies ist ein anderer einfacher Satz" de_en
 this is a other simple sentence
Hm... obviously, two German words "ein anderer" should make one English
word, "another". Before bothering with multi-word sequence lookup (which is
a powerful technique that enhances translation quality), we
can fix this one by post-processing, for which we start a map of strings
in a global variable - we can always add to it later:}
 set postprocess {{a other} another}
#-- 
 proc translate1 {string dictName} {
    upvar 1 $dictName dict
    string map $::postprocess [translate0 $string dict]
 }
if 0 {
 % translate1 "Dies ist ein anderer einfacher Satz" de_en
 this is another simple sentence
Good enough. But the next test
 % translate1 "Ich will diesen Satz übersetzen" de_en
 I want this sentence translate
brings us to the limits of word-by-word translation. We rather have to
parse the input into a tree structure, do transformations on that tree,
and finally render the target text. A simple parser can be borrowed from
[Syntax parsing in Tcl]:
}
 proc parse1 {s rules} {
    while 1 {
        set fired 0
        foreach {lhs rhs} $rules {
            if [llength [set t [lmatch [lskim $s] $rhs]]] {
                foreach {from to} $t break
                set s [lreplace $s $from $to \
                [concat $lhs [lrange $s $from $to]]]
                incr fired
            }
        }
        if !$fired break ;# no rule could be applied in this round
    }
    set s
 }
if 0 {Here's a starter kit of German syntax rules (including lexicals)
that's sufficient for the example:}
 set rules {
    s  {np vp}
    np {det n}
    np {det adj n}
    np {pn}
    vp {vm np v}
    pn  "Ich"
    vm  "will"
    det "diesen"
    det "einen"
    adj "anderen"
    n   "Satz"
    v   "übersetzen"
 }
#-- ..and some helper procedures: 
 proc lskim L {
    # returns all first elements of a list of lists
    set res {}
    foreach i $L {lappend res [lindex $i 0]}
    set res
 }
 proc lmatch {list sublist} {
    # returns start and end positions of first occurrence of sublist in
    # list, or an empty list if not found
    set lsub [llength $sublist]
    for {set i 0} {$i<=[llength $list]-$lsub} {incr i} {
        set to [expr {$i+$lsub-1}]
        if {[lrange $list $i $to]==$sublist} {
            return [list $i $to]
        }
    }
 }
 proc atomar? list {expr {$list eq [lindex $list 0]}}
 
if 0 {This is the result of the parsing, the tree coming as a nested list:
 % parse1 "Ich will diesen Satz übersetzen" $rules
 {s {np {pn Ich}} {vp {vm will} {np {det diesen} {n Satz}} {v übersetzen}}}
Unitl here we were concerned with single-language analysis. The
differences between S and T have to be taken care of in rules how to
rewrite, or transform, the parse tree.

Transformation rules can be specified as pairs ''from to'' - "if the parse
tree contains the sequence ''from'', convert it to the sequence ''to''",
where integers denote list elements (counting from 1) of the input, and other
%%strings as words that are to be inserted. I use the %% markup to keep the words
that are already in T from possible further "translation":
}
 set de_en_xform {
    {vm np v} {1 %%to 3 2}
 }
if 0 {The transformation takes a parse tree and rules, traverses the tree
recursively, and returns a modified parse tree:}
 proc xform {ptree rules} {
    array set xform $rules
    if [atomar? $ptree] {return $ptree}
    set t [lskim $ptree]
    set children [lrange $t 1 end]
    if [info exists xform($children)] {
        set tree2 [lindex $t 0] ;# top node category
        foreach part $xform($children) {
            if [string is integer -strict $part] {
                lappend tree2 [lindex $ptree $part]
            } else {lappend tree2 [list - $part] ;# literal}
        }
        set ptree $tree2
    }
    set res {}
    foreach part $ptree {lappend res [xform $part $rules]}
    set res
 }
if 0 {Here's how the transformed tree comes out, again as a nested list:
 % xform $pt $de_en_xform
 {s {np {pn Ich}} {vp {vm will} {- %%to} {v übersetzen} {np {det diesen} {n Satz}}}}
The "terminals" (or leaves of the parse tree) are still the German words, but the
word order has been
changed to the English one, and the "to" has been inserted in the right
place. So it's an easy job to finalize the translation by another tree
traversal:}
 proc translate2 {string dictName} {
    upvar 1 $dictName dict
    set parseTree [parse1 $string $::rules]
    set xformTree [lindex [xform $parseTree $::de_en_xform] 0]
    string map {%% ""} [translate1 [leaves $xformTree] dict]
 }
#-- This returns the list of the terminal nodes (leaves) of a tree:
 proc leaves tree {
    if [atomar? $tree] {
        set tree
    } else {
        set res {}
        foreach child [lrange $tree 1 end] {
            lappend res [leaves $child]
        }
        join $res
    }
 }
if 0 {
 % translate2 "Ich will diesen Satz übersetzen" de_en
 I want to translate this sentence
 % translate2 "Ich will einen anderen Satz übersetzen" de_en
 I want to translate another sentence
Phew - made it... and these are only the first steps into non-trivial
MT... More work is required, but obviously it's fun too to do it in Tcl.
Many years ago I've done it in Prolog, but somehow in Tcl I feel
more free and easy...

As a first summary, machine translation could involve the following
steps:
   * Preprocessing (not shown - case, punctuation, etc.)
   * Lemmatizing (not shown - analyzing word forms, extracting stem + ending)
   * Assigning categories to lemmas (e.g. noun, verb)
   * Parsing into a parse tree
   * Transformation of the parse tree
   * Generation of the target text from leaves of the transformed tree
   * Post-processing (e.g. fixing "a other")

Don't mistake these experiments for a translation system fit for
practical use - it's just a toy I've played with. But maybe it can help
to understand better the problems and possible solutions of machine
translation.
}

-----
[LES] on 2007-06-14: Pointless exchange with unperson suppressed from this space.
----
[RS] 2005-09-26: Here's playing again, this time using only [proc] to implement some crude MT, 
which starts from a parse tree, and does adjective-noun transformation :-}

#-- Syntax:
 proc S  {np vp}     {concat [eval $np] [eval $vp]}
 proc NP {det adj n} {concat [$det] [$n] [$adj]}
 proc VP {v np}      {concat [eval $v] [eval $np]}
 proc V  verb        {return [$verb]}

#-- Lexicon:
 proc dog    {} {return chien}
 proc the    {} {return le}
 proc brown  {} {return brun}
 proc boy    {} {return gar\u00e7on}
 proc little {} {return petit}
 proc calls  {} {return appelle}

#-- Testing - input a parse tree of English, get a French sentence out:
 % S {NP the little boy} {VP {V calls} {NP the brown dog}}
 le garçon petit appelle le chien brun

[WJP] Unfortunately this isn't quite correct because 'petit' is one of the rare
French adjectives that may precede the noun. There are subtle differences
in meaning between such adjectives when preceding and following the noun.
In this case, with 'petit' following the noun it means "small (in size)", while preceding
the noun it means "little", that is, "young". If "little boy" has the meaning
I think it does, of a young boy as opposed to one small in size, then the correct
French translation would have the adjective preceding the noun.

[LES]: That is just one of the thousands of hurdles you would face if you really tried to implement the idea. [RS] has made it clear that it is much more of a toy than anything else. 

[WJP] Yes, of course I didn't take the above to be intended as a real translation system. 
My point was that even something as simple as this, using common words, 
can turn out to be more complicated than it seems.
----
[tkbabel]
----
!!!!!!
%| [Category Human Language] |%
[Arts and crafts of Tcl-Tk programming]
!!!!!!