Version 9 of damage by duplication

Updated 2007-03-07 05:20:16

Anyone who's worked on a database knows that it's a Bad Thing to duplicate information in such a way as to require (or permit) updates in different places for the same information. There's a DB term for this ... normalization?

One reason this is a Bad Thing is (as experience has taught us): if there are two places to update something, sometime somewhere someone will update one, and not the other - what began as two identical copies will diverge over time.

Redundancy (defined as duplication of information in two or more distinct places) can be protective: one might back up a database, for example, or one might store a checksum somewhere.

The whole point of hypertext is to permit references to be constructed and copied, to permit hyperlinking instead of quoting. It therefore never makes sense to duplicate information for the purpose of referring to it.

One virtue of Wiki is that it enables full-text searching, a.k.a. "keyword in context" searching. So a Wiki partakes of some of the virtues of a document (editing) and some of the virtues of a DB (searching.) This is a Good Thing. This is why Wiki is useful as a collaborative space and as a repository.

The recurrent desire to construct encyclopaedic indices of cluelessness, most recently expressed in Call for suggestions for presentation of Tcl Apps is a form of duplication: duplication of search results. Someone constructs a plausible search, captures it to a page, and makes an index out of it.

Publishing pre-digested searches on this wiki is bad, for several reasons:

  1. the constructed/captured searches are an out of date snapshot.
  2. they occupy space in the title space of pages
  3. they compete with the primary materials for searches.
  4. they necessarily duplicate information in different places.
  5. they carry with them a hermeneutics, which they tend to reify, and whose privileged expression tends (for reasons 1-4) to drown out other voices.

So, in summary: merely because one individual has difficulty using the search facilities, and prefers to view his/her information as a pre-digested index, or even a structured pre-digested index (taxonomic pabulum), and merely because that individual has successfully constructed a search on some occasion is no reason for that individual to construct a so-called index page as a monument to their transitory moment of glory.

escargo 6 Mar 2007 - I understand your point, but I am not sure I agree entirely. Here's why.

  1. The search facility of this wiki is not as powerful as it could be.
  2. An index that is built rather than being captured from a search could add value to the Wiki rather than adding noise.
  3. I don't agree that a taxonomic set of indices is equivalent to a pre-digested index.
  4. As compilations are recognized as legitimate original works, so a taxonomic structure might be an original work of some worth.

So, it may be in practice that the indices that you dislike are of no worth, it would be hasty to say that they are always of no worth.

CMcC point by point:

  1. if the search is weak, fix the search, don't write around it.
  2. if you are arguing that it is possible to build a valuable index, I agree. Such an index would have to have a lot of original content compared to its merely/purely referential content. As such, it would resemble a link-rich page. That is not what we're seeing in these -Index pages.
  3. any taxonomy represents some pre-selection, some hermeneutics, which by analogy is pre-digestion, in that it processes/selects/strucures information before the reader can critically review it. Again, it could be done with style and quality - it's just not.
  4. absolutely - deriving a taxonomy is a useful work ... a lot of work. What I'm seeing is people banging what's not much more than screen caps of searches up on the wiki, calling them indices, and leaving it at that.

Human beings have been breaking information down hierarchally for ages - perhaps it's a consequence of literacy, it certainly helps them deal with complexity. However: repackaging what search gives you does not. I was actually trying to be generous calling these indices 'taxonomy' - they're hierarchy for its own sake.

I think my DB normalization analogy is apt. We're suggesting that possibly there's some information here which someone sometime can't easily find. We're suggesting the search be duplicated, along with some of the page content, because it will be easier for this hypothetical person to find ... that is premature optimization, sacrificing correctness for convenience.

jcw - I agree. One fact in one place. This does not imply "each fact in a separate place", btw. As for fixing the search - yep (like so many things), there is always room for improvement. Or just google with the "site:" tag.

(HW) - I disagree. Let's be practical for a minute here. You guys' ideas -mainly theoretical formulas- are fine and dandy on paper but they don't hold any water when you apply them to reality. So called index pages have grown naturally out of a need. Our Brian Theado did a marvelous job with the Tcl/Tk Games page for example, and this is something no machine category system could have done. A lot of other similar pages have spruced out throughout the years and they are all very useful: Tcl Editors, Graphics with Tcl-Tk to name a few. etc. Larry, CMC and JCW might not like the system but a lot of us love it.

The information is duplicated in many places? I don't see how naming a page on an index page -it takes two seconds- can be considered duplicating the information?

These pages are useful, they fill a need as Escargo has said and this is what really counts. What the category system does is simply list related pages stupidly, coldly without information. We don't have time for this bullshit. We need to have a clue as to what the contents of the page are quickly and as efficiently as possible by reading its description on an index page. I for one don't have two hours to search every page listed after a category search. I see an index page and I know right away if this is what I need or not.

All a category search does is list dumbly pages alphabetically. You wanna see? Click Category Discussion and see what happens. You'll have phone-book-like information. Who needs that info? Write an index page that makes sense, list more details on each, classify them chronologically etc. and you'll see the difference. How in the world can you compare human intervention with machine intervention. Get real! A machine is dumb; it does dumb searches. Humans can produce excellent index pages!

An index page classifies apps, give more details on them, introduces them well. The beauty of the index system is that you click on the title of a page and bingo! you have a list of the backlinks; and among them you know what index page the page belongs to. This sure is an excellent system. Contrary to what Mister CMC has said, a page can belong to many indexes. Simply name the page in as many indexes as you want!

As for more work for wikignomes, I disagree. Why create a new page on New Pages. Just create it on the Index page and bingo! It's done. By doing so you have created a back-link and a classification right away for this page.

You simply cannot replace something that has been implemented naturally through the years and discard it in the name of a vague and artificial principle that only exists on paper. Index pages have existed on all wikis for years and they surely proved to be fit to co-habit with the category system.

On the positive side, I find the Search on this wiki pretty state-of-the-art, excellent I might add. I personally don't make very refined searches so Search engine does its job very well for me. As for differences engine (the diffs), it's so pitiful that I don't even bother trying to figure out what changes have been done to a page. No time for that.


Category Community | Category Discussion