e4graph

Get it from: http://sourceforge.net/projects/e4graph/ Home page at: http://e4graph.sf.net/

e4Graph is a C++, Tcl and Java library that allows programs to store graph-like data persistently and to access and manipulate that data efficiently. With e4Graph, you can arrange your data in the most natural form that reflects the relationships between its parts, rather than having to force it into a table-like format. The e4Graph library also allows you to concentrate on the relationships you want to represent, and not on how to store them in a database. You can modify data items, and add and remove connections and relationships between pieces of data on the fly. e4Graph allows you to represent an unlimited number of different connections between pieces of data, and your program can selectively manipulate the data according to the relationships it cares about, not having to know about other connections represented in the data set. [Cool things: swappable, even scriptable, back end, at least architecturally. Arbitrary data stores. Conceptual power. Embeddability. Portability of data and source. Screaming performance. ...]

Jacob Levy 05/15/2003. This is a proposal for some API changes to e4Graph. Please send comments to jyl at mod3 dot net.

I wanted everyone to have an opportunity to have a say in this discussion, so that I can make the right decision. The background is that I'm finding the current terminology of e4Graph hard to explain and understand, and non-standard as far as graph theory goes. So the following changes are intended to make the APIs more conformant to the common terminology as well as making e4Graph richer, more capable and easier to understand. I want to make the changes *now*, before e4Graph 1.0final comes out, so I will not have to support two very different APIs forever. But I will do what the majority of those responding ask for, so your input is very important.

Some of you may know already that e4Graph is a way to store arbitrary directed graphs of objects in persistent storages. I use Metakit as the storage mechanism, with great success -- Metakit has been extremely robust, fast, portable and useful, and besides, it's supported by the world's most responsive programmer, JCW :). e4Graph has language bindings for Tcl and Java, and now Python is being added. I'm also planning to make e4Graph use other DBs as storage, e.g. mySQL, postgress, etc.

e4Graph currently has just two concepts, nodes and vertices. Nodes are ordered collections of vertices, and vertices are a mechanism to transition from a node to a value (be it an integer, double, string, binary or other node). You'll note that the terminology is not graph-theory-standard; usually what e4Graph calls vertice is called edge in graph-theoretic terminology.

The new e4Graph API will be based on three concepts (instead of the existing two):

 1. Nodes stay the same.
 2. Vertices that lead from a node to another node are now edges.
 3. Vertices whose value is a scalar, string or binary are now attributes.

An attribute is simply an association between a string name and a typed value. In the new API, attributes can have *anything* as their value, including scalars, string, binary, edge or node.

Nodes thus now become ordered collections of edges. Additionally, edges and nodes can have any number of attributes. Previously verices were named. The name now becomes one of the attributes on the equivalent edge.

All of this implies large changes in the APIs of all language bindings. I formally can do this now, because e4Graph is still in Alpha (and I kept it in Alpha for a very long time for the express purpose of getting everything right the first time). However, I recognize that many projects already use e4Graph, some commercially. Therefore I solicit feedback.

Two alternatives, besides making the changes now:

 1. The first is to bring e4Graph 1.0 to the release version with the
    current API and then immediately abandon it. New development will go into
    e4Graph 2.0 which will be finished much faster than 1.0 was.
 2. Not make any changes, ever. e4Graph 1.0 is the only version, and we live
    with the non-standard terminology and without the flexibility afforded by
    the new attribute mechanism.

As might be obvious, I prefer the proposal I'm making here, and as a less desirable alternative, to finish 1.0 and move on immediately to 2.0 while supporting 1.0 with bug fixes (but NO NEW DEVELOPMENT).

Please send in your comments. I will wait for a couple of weeks before deciding anything.

Jacob Levy 05/15/2003 Someone suggested in email to allow attributes to have as their value an attribute. That is, the value of attribute "a" would be an attribute "b" (potentially on some other node, even). This seems hairy to me, but I'll just note it here in case someone wants to comment on this either yea or nay.

jmn 2003-05-17 Echoing my comments regarding metadata in c.l.t, I'd just like to cast my vote for avoiding such complications. The beauty and power in complex systems often comes from what they *don't* allow you to do. I dread the mess of (to my mind) ungraph-like interconnections that allowing nodes,edges & attributes as attribute values would create. Surely if the application developer wants to achieve extra such interconnections, they can layer it on themselves by using simple attribute values to influence how they traverse the graph. I'd welcome any counter-arguments or examples of just what complex attributes might be needed for.

Jacob Levy 2003-05-18 I agree with jmn, however I wanted to give fair airing to all suggestions. I'm most likely going to stay with the three-concept approach of nodes, edges and attributes, and attributes can have as their value scalars, strings, binary values, nodes and edges. That seems complicated enough already :)

Jacob Levy 05/15/2003 Has anyone thought about making the Tcllib graph and struct::tree packages in tcllib be wrappers for e4Graph storage? Is there any impedance mismatch? Would this be a cool thing to do?

AK May 15, 2003. Cool: Yes, IMHO. Impedance mismatch: Yes, for the current interface. For the proposed interface there is less mismatch. The concepts are the same (node, edges, attributes, attributes for nodes and edges). Missing in e4graph are attributes associated with a graph instead of components, tcllib has them. Another possible mismatch is the addressing of the nodes and edges in tcllib and e4graph. Might need storage in the tcl layer for translation, or maybe through special attributes. Definitely feasible.

See also cgraph.

Jacob Levy 05/15/2003 OK, for attributes for whole graphs in tcllib, in e4Graph they could be attached to the root node. That way there's a simple way of finding them. Can you expand on the addressing issues?

AK: root node ? In a graph ? Ok, will have to see later when the new interface comes more into focus. The adressing issues are simply that e4graph uses rank to get at edges, and possibly attributes, and in tcllib at least edges have something like "nodeFoo" as their permanent id. Can be specified by the user, or auto-generated (node0, node1, ... arc0, arc1, ...). And attributes are adressed by name.

Jacob Levy 05/15/2003 When you open a storage you have to start somewhere, right? You grab the root node, which is a node that the last person updating the storage decided is somehow distinguished, using the C++ API e4_Storage::getRoot(e4_Node &root), and then you go from there. The main point is that this is just a handle to get started with accessing what's in a storage once you open it. Other than that, it does not have any significance. And this concept is already present in the current interface.

Edges will have (as vertices now have) a way to get the "containing node". Is that what you're referring to as "addressing"? And yes, in the new design of e4Graph, attributes are addressable by name (as well as by rank).

I'll write up a complete header file for the new interface, and post it here, so that people can have a look. That will surely help clarify some issues.

Michael Schlenker 05/16/2003 If e4graph wanted to provide the same interface as tcllib graph, the relevant code could probably ripped out of cgraph, it does all of the tcllib interface stuff quite well. I have to refactor the internals some more to make it easier to change the actual graph implementation (i'm not yet there with cgraph 0.6, some references to Tcl_HashTables remain in the high level interface code). When I'm done with that refactoring you have to write some more or less trivial functions (name management, arc/node/data management) and have a tcllib compatible interface.

Jacob Levy 05/16/2003 Conformance with tcllib graph is not a goal, in the sense that I am not going to change the interface of e4Graph for it (or at least not a lot). If it's possible, it's a nice thing to have. More important is to have graph theory terminology conformance, and the smallest number of concepts that's still maximally expressive (I think cgraph is especially cool because of expressive power and paucity of concepts). I think the Tcl binding for e4Graph can probably be enhanced with a service module that will map to tcllib graph.

Also, I'm getting some really cool suggestions by email from people I didn't even know used e4Graph, so thanks to everyone. And it also means that new, good ideas are coming in at a pace that I find hard to digest (;-), so bear with me.

e4Graph makes a prominent appearance in a developerWorks article [L1 ] on high-performance XML (which subject, incidentally, ought also focus attention on tDOM.

27Sep02 Jacob Levy e4Graph has a very powerful XML binding, it is able to store XML natively and allow natural manipulation of XML data as tree structured data. It also has an excellent Tcl binding, and provides an extensible object model where Tcl procedures can be stored inside the graph and extend the functionality of nodes.

Here's an example of this idea:

 % load tgraph                   (1)                  
 % set s [tgraph::open foo.db]   (2)
 % set r [$x root]               (3)
 % $r add bar last hello         (4)
 % $r method square {x} {        (5)
         return [expr $x * $x]
   }
 % $r call square 5              (6)
 => 25

Explanation: line 1 loads the tgraph library, which contains the Tcl binding for e4Graph. Line 2 opens the storage 'foo.db' and stores the result in 's'. Line 3 makes 'r' refer to the root node of this storage. Line 4 adds a new vertex named 'bar' as the last vertex with the string value 'hello'. Line 5 defines the method 'square' which takes one argument, a number, and returns the square of it. Finally, line 6 invokes the method to compute the square of 5.

Category Graph theory

Category Package

Category Mathematics

Category Data Structure