Thoughts on Unified view of data and program

Smalltalk brought OO into mainstream. It took the extreme step of giving a unified view of everything is a object. In non-Smalltalk world, there is an active process/program that acts on passive data. Smalltalk removed this dichotomy and viewed every piece of data as a tiny active process responding to messages. In other words, the passive data is no more considered passive; it is considered another active process.

This article tries to analyze such similar attempts to unify data and program.

In the same way, Smalltalk unified passive data into active process. There are also systems that unified them other ways, i.e., active process are treated as passive data.

One popular model is WWW. Basically URL refers to a resource, i.e., a file, traditionally considered passive data. CGI is a way of interacting with programs, through URL. Every invocation of method with a set of parameters is treated as a file whose content is the result of that invocation.

Plan-9 system, from the Unix camp, uses kind of this approach. Most services are viewed as a file in a namespace (filesystem). Reading and writing stuff into them invokes the methods of the process.

Why software system designers try to unify them?

These are some of my thoughts. Unification creates simplicity. Why simplicity matters. For some people it matters; for some it does not matter. Tcl incorporates simplicity model, but Perl does not.

There are 3 categories of people who deal with computers.

  1. End-users who just use computers to get things done.
  2. Programmers and system administrators.
  3. Researchists in computer science field and other field.

Most contributions to open-source software from 2 categories of people - programmers and system administrators, computer science researchists. Though some people use the term hackers for both, there is distinct mindset between the two and their contributions.

The researchists try to find a new paradigm or architecture for solving problems and improving efficiency. They are macro-thinkers. They tend to love unity, the same way Einstein searched for unified field theory. Simplicity and unity go together. On the other-side, hackers are mostly micro-thinkers. Most paradigm shift, architectural innovations come from researchists and not hackers.

LISP, Smalltalk, to some extent Tcl, plan9 come from this camp.

Unix, C, perl, php, python, ruby - come from the other camp.

A simple mechanism has to to deal with the smallest case and biggest case in a consistent fashion.

As a contrast, take the Unix shell paradigm. Most Unix tools heavy use unidirectional pipes. Therefore most tools are forced to fit into filter paradigm. But many programs do not fit this model like ftp, server programs. They need bidirectional pipes. Though it is trivial to create bidirectional pipes from unidirectional pipes. There is not of much plumbing tools in Unix world, till "expect" came along. Therefore unix does not care for consistency from the smallest to the biggest.

Architectures that use one consistent model for smallest and largest requirements have these characteristics - simplicity and scalability.

But there is a cost for these benefits: Performance. The approach that can deal with complex requirement puts too much overhead for the smallest case. We can see this in Smalltalk, Tcl. In Tcl, conversion from string to native format, when using C libraries.

Now to whom simplicity matters? Researchists and end-users. Hence architectures emphasising simplicity become popular among end-users. Products of Smalltalk camp - GUI became huge success. It is an extension of object interaction taken to the highest level.

WWW also adopts unified view - everything is a file. This unified view bring simplicity. It finds huge success among end-users. Again this simplicity brings with it poor performance. But people have accepted it for its simplicity.

Here simplicity refers to simplicity of interface not the implementation. Only architectures emphasizing simplicity bring computing power to the masses. Of-course at the cost of performance. but it may not matter with time.

This is one reason Linux finds it difficult to get into desktop market. To address masses, simplicity is the criteria. But hackers want to take the last juice out of the processor. They measure systems, languages based on performance measures like pages served/per second. For them simplicity is not a criteria. Hence their tools, products does not have simplicity and hence finds it difficult to reach masses.

Here is a small experiment in TCL to unify passive data and active process. An active process can be substituted for an object.

Four TCL commands achieve this unity.

open, close, read, write.

open - Either opens a file for use, or Makes a program an active process for further interaction using {|} prefix.

close - Closes the file or Terminates the process.

read - Read data from the file or from the process.

write - Write data to the file or process.

For convenience, we can define a command "message" as a write, followed by read.

Then "open" can be used to instantiate an object from class. "close" could be used to destroy the object. message (ie write-read) - can be used for message passing to objects.

May be some object model, can give this convenience of interacting with objects using these commands.

- vkvalli


Lars H: Sometimes unification is good, other times it is not so good. Your thoughts above suggest a very concrete discussions of this:

Unification creates simplicity. ... They tend to love unity, the same way Einstein searched for unified field theory. Simplicity and unity go together.

While it is debatable whether Einstein's work on a unified field theory accomplished anything of lasting value, he certainly did succeed in unifying the theory of motion with the theory of gravitation, by creating the General Theory of Relativity, which has superseded Newton's mechanics and law of gravitation. At least in principle. In practice almost every problem (in that domain) is still modeled using Newtonian mechanics, even though it is known to be (very) slightly wrong. Why is it so? Because the non-unified Newtonian mechanics and gravity is much simpler!

Efficiency is of course one factor that one has to take into account when considering using a unified theory (solving Einstein's field equations isn't easy), but for most people that issue doesn't even arise, because they gave up on using General Relativity long before they ever got around to setting up equations ("So space-time is a four-dimensional manifold and g denotes its metric tensor... What's a manifold?"). Similarly one may in programming try to unify pretty much any concepts A and B as much as one likes, but if the unified AB has to have a dozen bizarre features to even work then this unification has created complexity rather than simplicity. In physics there is the hard reason that the unified description is more accurate than the older one, but in computing you are completely free to make up your own rules, so why choose one that is convoluted and obscure? Merely that some aspect of it is simple doesn't mean the whole is.

Tcl, I can agree, is simple. LISP I'm not so sure about. Reading the Common LISP manual mostly gave me the impression of a horribly complicated language (much of which is of course due to carrying around a type system, but not all), although I suppose you can make it easier by restricting yourself to some still-universal subset of the language.