[CMcC] 9May07 - this is the code I used to repair the wiki from history. Can someone advise as to why it mucks up unicode in titles? [LV] I posted pointers to this page on both the Metakit and the tclerswiki mailing lists. Hopefully someone will stop by with comments. package require Mk4tcl package require fileutil encoding system utf-8 set dbf [lindex $argv 0] set histdir [lindex $argv 1] foreach f [glob -tails -directory $histdir *] { if {[string match .* $f]} { continue } lassign [split $f -] id date who if {![info exists diffs($id)] || $date > [lindex $diffs($id) 0] } { set diffs($id) [list $date $id $who $f] } } mk::file open db $dbf foreach id [lsort -integer [array names diffs]] { #lappend repairs [lindex $diffs($id) 1] lassign $diffs($id) date id1 who f set content [split [fileutil::cat -encoding utf-8 [file join $histdir $f]] \\n] set title [lindex $content 0] set content [join [lrange $content 4 end] \n] if {$id >= [mk::view size db.pages]} { set title [string trim [lindex [split $title :] 1]] puts "adding $id '$title'" mk::row append db.pages name $title page $content date $date who $who } else { puts "modding $id" mk::set db.pages!$id page $content date $date who $who } } mk::file commit db mk::file close db ---- [wdb] Just a try -- as far as I understand, meta stores ASCII only -- perhaps it makes sense, before write to db, the unicodes convert in Tcl conventions such as \u004f, and after read back, perform a [subst] -novariable -nocommand $title? ---- [NEM] Regarding unicode breakage, I don't see anything wrong with the code here. What does the code that saves the history look like? Are you sure it is saving pages as UTF-8? If you get rid of the ''encoding system'' and ''-encoding'' options, does that help things? Also, does Metakit know anything about encodings or does it treat strings as just blobs of binary data? ---- [EMJ] Page contents have also been mucked up - see e.g. http://wiki.tcl.tk/18008 which I had fixed not long ago - and it also seems to have changed its page number (was 18012). Also if you look at http://wiki.tcl.tk/_ref/17213 you will see many pages listed which do not actually contain such a reference - I edited a couple, which forced them of the list, but most of them do not contain the reference and are still there.