Version 23 of WikiDiff

Updated 2002-11-24 11:21:59

I have put up a service that shows the changes made to the wiki in the last 24 hours. It runs every day at 11:15 am CET. You can look at it at http://pascal.scheffers.net/wikidiff/ but I'd like to put it right here in the wiki if that is possible. For that, the wikit would need some extra formatting rules to make use of the different colours.

Source of the wikidiff software is at http://pascal.scheffers.net/wikidiff/wikidiff.tcl.txt

Right now it just parses the cvs-diffs of the day before and dumps those into a page (which uses a slightly added to wiki.css, so it looks the same as the wiki!). Comments please.

-- PS

Brian Theado - Very nice! I especially like the all diffs in one page approach. I encountered changes that interest me that never would have drawn me in if I had just seen the page title on the recent changes page.

I wrote some functionality to display diffs for an old version of wikit. The date at the bottom of each page is a hyperlink that when clicked shows the most recent change of the page. This functionality can be seen at http://tkoutline.sourceforge.net . Now, this old version of wikit stores changes within the wikit database. The newest versions of wikit stores the various versions of a page in an env(WIKIT_HIST) directory.

Based on my desire to upgrade my tkoutline wikit to the latest version without losing the diff functionality and on my desire to see similar functionality here on the Tcl'ers wiki, I have started some code (see below). It makes use of KBK's code from diff in tcl. All that's left is to figure out how to specify via the URL what diffs to display (The code I have in the tkoutline wiki uses the "^" symbol appended to the page URL to display the most recent change. I'd like a way to specify more than just the most recent change).

More comments at bottom of page...

 package require Diff ;# i.e. the diff in tcl code from http://wiki.tcl.tk/3108
 catch {namespace import list::longestCommonSubsequence::compare}

 # Helper function for the diff callbacks
 proc appendDiff {mode value} {
    variable diff
    set lastMode [lindex $diff end-1]
    if {$lastMode == $mode} {
        set oldValue [lindex $diff end]
        set diff [lreplace $diff end end $oldValue\n$value]
    } else {
        lappend diff $mode $value
    }
 }

 # The following three functions are callbacks for the diff function
 proc removed { index value } {
     variable diff
     appendDiff removed $value
 }
 proc added { index value } {
     variable diff
     appendDiff added $value
 }
 proc matched { index1 index2 value } {
    variable diff
    appendDiff matched $value
 }

 # Returns the contents of the given file
 proc getFile {fileName} {
    set fd [open $fileName]
    set contents [read $fd]
    close $fd
    return $contents
 }

 # Converts a version as specified below into a list index
 proc getVersionIndex {version} {
    if {[string index $version 0] == "-"} {
        return end$version
    } else {
        return $version
    }
 }
 # Versions can be specified as an absolute positive version number
 # starting at zero and counting up.  A version relative to the most
 # recent can be specified with a negative number.
 # i.e. To see the most recent change: getWikiPageDiff $id -1 -0
 # This function returns a list in the format of chunktype text pairs
 # where chunktype is one of matched, added, removed
 #
 # TODO: It would be nice to be able to express "give me the difference
 # between the page as it is now and how it was 24 hours before the
 # most recent change"
 proc getWikiPageDiff {id version1 {version2 -0}} {
    variable diff
    set ewh $::env(WIKIT_HIST)
    set versIdx1 [getVersionIndex $version1]
    set versIdx2 [getVersionIndex $version2]
    set versions [lsort [glob $ewh/$id*]]
    set list1 [split [getFile [lindex $versions $versIdx1]] \n]
    set list2 [split [getFile [lindex $versions $versIdx2]] \n]
    set diff {}
    compare $list1 $list2 matched removed added
    return $diff
 }
 package require cgi
 proc displayHtmlDiff {id version1 {version2 -0}} {
    # Special colors for the various diff pieces
    array set options {
        added bgcolor=\"#ffffaf\"
        removed bgcolor=\"#cfffcf\"
        matched {}
    }

    # Legend
    cgi_table width=200 size=-1 {
        cgi_table_row $options(removed) {
            cgi_td [cgi_font size=-1 "Removed"]
        }
        cgi_table_row $options(added) {
            cgi_td [cgi_font size=-1 "Added"]
        }
    }
    hr noshade

    # Display the entire page with the differences embedded within
    cgi_table width="600" {
        foreach {mode value} [getWikiPageDiff $id $version1 $version2] {
            cgi_table_row $options($mode) { 
                cgi_td [lindex [Wikit::Expand_HTML $value] 0]
            }
        }
    }
    return
 }

22nov01 jcw - Agree with Brian - great to see these things happen now. From a brief email exchange with Pascal some thoughts (no more than that, really):

  • Yes, all-in-one-page really makes it easy to skim for what's important.
  • Idea: omit all diffs over say 25 lines, that keeps the page nicely limited, it may even entice people to stick to small(er) more concise comments.
  • How far back should the summary diff go? I'd think that a summary, listing diffs with what was on the page 3 days ago, makes it easy to track things and bridge the weekend. Only one diff per page number, summarizing multiple changes all in one, might work IMO.

That sort of raises the issue how much diffing is needed in all. While access to all diffs on each page is technically feasible, I'd be inclined to think it would confuse/overwhelm/distract more than offering just a bit of diffs. Last day, week, month - perhaps? Three links per page?

The history is now a separate subsystem on mini.net - the wiki stores the latest page version only, while all changes get archived and simmer down into the CVS historical archive once a day. It's sort of a collect-and-sweep daily cron job. What seems to work well is that latest and daily-snapshot versions are both efficiently available (static page accesses in fact - though "current" is HTML, whereas CVS daily-snapshot files are in raw wiki input format).

Whatever diff mechanism we come up with ought to keep those two sides of this wiki as lossely coupled as possible IMO.

Let me also add that there is the start of a remote sync/update mechanism. If you get a copy of the wiki and run it locally, there is the option of updating it from the CVS daily snapshot, and that mechanism is quite efficient, so it ought to scale once fully ready. That means one can have a local copy of the Tclers' Wiki, and easily track it, while using it as a local Tk app (with much snappier search capability than the web can offer). To try it, get a copy of wikit.tkd (from the usual wikit.gz url), and do:

   tclkitsh wikit.kit wikit.tkd -update http://mini.net/tclhist

I'm mentioning this here (it was also mentioned on the tclerswiki mailing list), to emphasize that we need to plan so things remain open-ended when diffs get brought into the picture. This update mechanism, for example, *only* uses the tclhist/ area.

23nov02 ps - Well, after a night processing ideas, there are two things I certainly want to do. The first is not use the diff output of tclhist, but grab the current and the previous version(s!) and run... the next bit I thought of. Namely, a lot of changes on the wiki are typo changes. Those should be displayed inside the line, not the way it is now - remove entire line, add entire line. That would combine nicely with the diff in tcl code. I am going to try that today.

That would also make it less bound to tclhist, because I'll go grab all pages through [getPage pageId revisionId]. Simply reimplement that function, and [getPageVersionList] which should return a list like tclhist/index: {{pageId revisionId lastChange} {...}}. I walk over previous version until I find the one that was online three days ago (or a similar period). I could also base the backward in time traversing on how many lines of diff are produced, and then stopping at either 20 lines, three (or more?) days, etc. I agree that three days is a good timespan, otherwise the page would only be interesting for the super-regular wiki users.

Talking about scaling, seeing how [cvs commit oneSpecificFile] is very much a light weight call on a local repository it may be a good idea to call [cvs commit] for each page alteration (possibly spawning a separate process to do that). That would make it possible to implement a 'roll back' function. But more importantly, today anyway, give the possibility to do event nicer change reporting, especially on the busy pages.

As for the 'over 25 lines', I have already added that to my local version. By the looks of it, we'll probably want to keep that.

A diff history per page could be a good thing, although I agree that it may be a bad idea to provide yet more links at the bottom of each page. I find the tkoutline 'click on the last update' counter intuitive, I expected that would bring me to the recent changes page... I would prefer a [page history] link.

I'd also like to propose keeping a historical archive for these pages, or maybe even a special version that lists diffs per day (or week?) and provide those for the community. These can, quite trivially, be generated starting from the first day that tclhist was started. I think it'd be nice to see how the wiki actually evolved. As in http://www.equi4.com/docs/vancouver/page-edits.png ? -jcw

Pascal, have a look at [L1 ] as an example of a version summary that can help you pick the right CVS version without iterated fetching - the second item is the unix-seconds modtime... jcw

23nov02 jcw - As to immediate CVS updates - yes, but there is a reason for the daily approach: the update mechanism I mentioned. If we update instantly, then people will start to update many times a day, which may bring down the server (or hit my - large, but finite - bandwidth allowance). If we keep updates as a daily thing, it'll be useless to hit update all the time. It may sound silly - but I really believe in limiting focus with each tool we have. The wiki is a repository, not a discussion forum. As I'm proving by typing this - it does work as such, but I still think we should see the wiki as a knowledge base, the chat for intense discussion, email for other exchanges, comp.lang.tcl for one-to-many posts, etc. The wiki is at about 750 hits/day on page 4 right now. Given that it is a public resource with no obstacles to use, what happens if it hits 10x or 100x that activity? I'm sort of trying to outrun that rat race, by trying to have a local mode copy which works well as resource before things get out of hand. One reason for creating the chat, was to maintain more focus on the wiki as repository - maybe we need more such subsystems...

Another way out is to work towards a set of mirrors. All I want to point ou