damage by duplication

I can't believe the damage that nodule of pure idiocy (RA) did to the wiki. Have a look at the zillion tiny pages with 5 lines of stuff ripped out of more substantial pages.


Anyone who's worked on a database knows that it's a Bad Thing to duplicate information in such a way as to require (or permit) updates in different places for the same information. There's a DB term for this ... normalization?

One reason this is a Bad Thing is (as experience has taught us): if there are two places to update something, sometime somewhere someone will update one, and not the other - what began as two identical copies will diverge over time.

Redundancy (defined as duplication of information in two or more distinct places) can be protective: one might back up a database, for example, or one might store a checksum somewhere.

The whole point of hypertext is to permit references to be constructed and copied, to permit hyperlinking instead of quoting. It therefore never makes sense to duplicate information for the purpose of referring to it.

One virtue of Wiki is that it enables full-text searching, a.k.a. "keyword in context" searching. So a Wiki partakes of some of the virtues of a document (editing) and some of the virtues of a DB (searching.) This is a Good Thing. This is why Wiki is useful as a collaborative space and as a repository.

The recurrent desire to construct encyclopaedic indices of cluelessness, most recently expressed in [Call for suggestions for presentation of Tcl Apps] is a form of duplication: duplication of search results. Someone constructs a plausible search, captures it to a page, and makes an index out of it.

Publishing pre-digested searches on this wiki is bad, for several reasons:

  1. the constructed/captured searches are an out of date snapshot.
  2. they occupy space in the title space of pages
  3. they compete with the primary materials for searches.
  4. they necessarily duplicate information in different places.
  5. they carry with them a hermeneutics, which they tend to reify, and whose privileged expression tends (for reasons 1-4) to drown out other voices.

So, in summary: merely because one individual has difficulty using the search facilities, and prefers to view his/her information as a pre-digested index, or even a structured pre-digested index (taxonomic pabulum), and merely because that individual has successfully constructed a search on some occasion is no reason for that individual to construct a so-called index page as a monument to their transitory moment of glory.

escargo 6 Mar 2007 - I understand your point, but I am not sure I agree entirely. Here's why.

  1. The search facility of this wiki is not as powerful as it could be.
  2. An index that is built rather than being captured from a search could add value to the Wiki rather than adding noise.
  3. I don't agree that a taxonomic set of indices is equivalent to a pre-digested index.
  4. As compilations are recognized as legitimate original works, so a taxonomic structure might be an original work of some worth.

So, it may be in practice that the indices that you dislike are of no worth, it would be hasty to say that they are always of no worth.

CMcC point by point:

  1. if the search is weak, fix the search, don't write around it.
  2. if you are arguing that it is possible to build a valuable index, I agree. Such an index would have to have a lot of original content compared to its merely/purely referential content. As such, it would resemble a link-rich page. That is not what we're seeing in these -Index pages.
  3. any taxonomy represents some pre-selection, some hermeneutics, which by analogy is pre-digestion, in that it processes/selects/strucures information before the reader can critically review it. Again, it could be done with style and quality - it's just not.
  4. absolutely - deriving a taxonomy is a useful work ... a lot of work. What I'm seeing is people banging what's not much more than screen caps of searches up on the wiki, calling them indices, and leaving it at that.

Human beings have been breaking information down hierarchally for ages - perhaps it's a consequence of literacy, it certainly helps them deal with complexity. However: repackaging what search gives you does not. I was actually trying to be generous calling these indices 'taxonomy' - they're hierarchy for its own sake.

I think my DB normalization analogy is apt. We're suggesting that possibly there's some information here which someone sometime can't easily find. We're suggesting the search be duplicated, along with some of the page content, because it will be easier for this hypothetical person to find ... that is premature optimization, sacrificing correctness for convenience.

jcw - I agree. One fact in one place. This does not imply "each fact in a separate place", btw. As for fixing the search - yep (like so many things), there is always room for improvement. Or just google with the "site:" tag.

[HW's rant and mixed metaphors moved to the sock puppet drawer: [L1 ]]


wdb If I see in this wiki a page named "Index Apps", I trustfully expect apps. Everybody agrees about what an application is. If I see a page named "Index Apps Geography", my trust dies down. I will not try to explain why. Just a feeling. I am not annoyed about its existence, I just ignore it. Not my way.

The electrically-generated lists of related pages in this wiki are fine. The list is phone-book-like, but I never had the ambition to read one from top to bottom. I always took it and looked it up alphabetically (on paper) or by some smart search function (on HTML).

(Btw, have you ever seen the movie Rain Man with Dusty Hoffmann? Dusty --- the rain man --- sees the name sign of the waitress and says her phone nr. His brother asks him: heave you read the whole phone book? Dusty says: No. Just until "K".)


MSH - I've been using [TiddlyWiki:[L2 ] says One fact in one place. Was it ever considered adding tags to these wiki pages, these are relativly simple to handle but give a powerful index access.


LV, MSH, how would conceptually, adding tags provide different functionality than adding categories?

I use http://del.icio.us/ occasionally to store interesting URLs. The tagging there is interesting, but the main benefit for tagging that _I_ can see, over this wiki's auto index generation, is the potential for narrowing or broadening a search by use of tags - the ability to specify multiple terms and getting more relevant hits.

I think that, rather than adding yet another indexing mechanism, if one of you wanted to look into improving the wiki experience, you might look into expanding the search functionality. Adding the ability to perform "and" and "or" type searches, so one might say "Category Application" and ("Category Science" or "Category Physics") would be useful. And of course, as jcw has indicated, one can do more sophisticated searches by making use of google's search language, so the features you propose to add should attempt to provide some sort of benefit over google.


(Administrator message to 24.37.108.108/videotron.ca: please identify yourself, or email me. -jcw)


alove 29-may-2007 This reminds me of the Desktop Search problem, in which Apple, Microsoft, etc. originally had grand plans about doing comprehensive searches on your desktop. The problem they ultimately had to face is, most of the data on your computer isn't "tagged" as being in a particular category, so the best you can do is search for words and phrases and hope it pulls up relevant data. File formats only take you so far.

Ideally, whenever a person created a document or program, they would tag it for all the relevant categories in a standard way (as libraries do with books, only in more detail) and then data search would be more accurate and predictable. Unfortunately there is no standard set of categories. All you can really hope for on this wiki is that people will create relevant page titles like "Canvas image capture problem" instead of "I can't get this to work..." and then tag the article to a category. Then the wiki automatically pulls it up under that category and there is no duplication.

So the only thing I would suggest is, when creating a new page, there could be a reminder notice to "choose a good title" and also a required field to "select a category" for the page. But let me just add, I think this wiki is pretty awesome. ^_^

DKF: It took years and a huge amount of work by bright people to make searching of the Web viable, and it turned out that the key thing was all the links. Desktop searching is more difficult because there are far fewer links (often none at all) so the relevance of individual documents is harder to compute. Something like SQLite's FTS (Full Text Search) engine goes a long way, but relevance ranking remains a Hard AI Problem.

In terms of the Wiki, one good thing to do (and LWV does this a lot) is to look at pages from time to time and to work out what categories they should be in. When you do this, you can do as few or as many as you want; you don't have to boil the ocean, just make enough for a cup of tea. :-)

Harm Olthof Slowly new and simple methods do emerge, based on solid concepts from information theory, the current size of the Google index and the fact that all what we put in files is (also) a reflection from our history, education, cultural background, etc as human beings. Have a look how to take advantage of that at [L3 ].