Orphans

Tool developed...`

Code:

 proc checkOrphan {name} {
        set refs [mk::select wdb.refs to $name]
        if {[llength $refs] == 0} {
                puts "Orphan found: $name"
        }
        set _ { --- THIS IS A COMMENT ---
        foreach r $refs {
 
                set r [mk::get wdb.refs!$r from]
 
                pagevars $r name
 
                lappend refList [list $name $r]
 
        }
        }
 }
 
 mk::file open wdb $filename -readonly
 set N [mk::select wdb.pages -min date 1]
 foreach name $N {
        checkOrphan $name
 }

Would probably be extremely CPU-intensive on a database like the one from the Tcl'ers wiki...

Anyway, tested with a small database - fine.

Would probably be better to improve the SQL - this was just a quick-and-dirty with half of the code cut-and-paste'd from the Wikit engine.

Improvements are welcome ;)

gg - 2006-12-30


I am going to develop a tool which fulfils the function of OrphanBot, but using a local database. Far less overhead. I'll post updates to the status of that as things happen.

gg

2006-12-30


This is now the dedicated page for the bot.

The code is below.

gg


I have set up this page with the intention of creating a bot which will automatically search the wiki for orphaned pages. I am already working on the bot now. It will probably have a dedicated page soon.

gg


NOTE: This code should not to be run, especially not against the Tcl'ers wiki. Make sure you know what you are doing before running this code, since it is going to consume huge lots of web traffic if you are not careful.

  #!/usr/bin/env tclsh
  #
  # OrphanBot
  # ----
  # Finds orphaned pages on a Wikit style wiki.
  #

  set base "http://invalid.host.name"

  # memoize proc - taken from https://wiki.tcl-lang.org/memoizing
  proc memoize {} {
    global memo
    set cmd [info level -1]
    if {[info level] > 2 && [lindex [info level -2] 0] eq "memoize"} return
    if { ! [info exists memo($cmd)]} {set memo($cmd) [eval $cmd]}
    return -code return $memo($cmd)
  }

  # range proc - taken from https://wiki.tcl-lang.org/for
  proc range {from "to:" to} {
    set res [list]
    for {set i $from} {$i<=$to} {incr i} {lappend res $i}
    set res
  }

  proc getPage {id} {
    memoize
    return ::http::data [::http::geturl "$base/references/$id!"]
  }

  proc pageExists {id} {
    if {$id == 2} { return 1 }
    if [regexp "<title>References to Search</title>" [getPage $id]] {
      return 0
    } {
      return 1
    }
  }

  proc isOrphaned {id} {
    if [pageExists $id] {
      if [regexp "<ul></ul>" [getPage $id]] { return 1 }
    }
    return 0
  }

  proc findOrphans {maxId {$minId 0}} {
    foreach id [range $minId .. $maxId] {
      if [isOrphaned $id] {
        puts "Orphan found: $i"
      }
    }
  }

LV 2007 Jan 03 - did you look at wikitool's code? It currently generates a number of reports, including a list of pages and who references them, etc. Thanks for the code though - one of these days, if I get some time, I want to play with it a bit.

Here's an example of using wikitool to get orphan pages:

        wikitool.kit wikit.tkd xref page@page > Pagexref.txt
        egrep [A-Za-z] Pagexref.txt | \
                replace '[       ][      ]+' ' ' | \
        egrep '\.$' > /tmp/.lwv/wiki.orphans.txt

(where replace is a filter that takes a regular expression first argument and a specialized second argument and reads its stdin, examines each line for a match using the first regexp and, if match is found, applies the second argument).