Database Interface

TDBC is the official database interface specification and implementation for Tcl. this page, Database Interface documents the various ideas and proposals that lead to the development of TDBC.

Interface Proposals and Implementations

Rivet's DIO

Database Abstraction Standard: Developed by Philip Quaife

Tcl-SQL: by Tony Darugar

nsdbi: Naviserver's database interface

nstcl, by Michael A. Cleverly: implementation of the AOLserver/OpenACS interfaces

tcl dbi

Tcl-GDBI

tcldb

tcl-dac

ODBC

JDBC: Jacl and TclBlend

LDAP

NeoSoft Tcl: includes a database interface

tdbo - Tcl DataBase Object: provides an object oriented mechanism to access data in data stores. Currently supports sqlite3, mysqltcl, Pgtcl, tdbc::sqlite3, tdbc::mysql and tdbc::postgres drivers.

History

RS 2001-06-11: At the Second European Tcl/Tk Users Meeting, it was announced that the German Environment Protection Agency has developed an abstract database interface over ODBC, Postgres etc. for Tcl which will be made public in Summer 2001.

Description

Instead of creating code through which all requests are made, an OO-ish option would be to return a database handle (as some wrappers already do), and insist on a sufficiently similar set of method names and conventions that the same calls will work across different database extensions. Nothing stellar, but perhaps good enough? I know of at least one (now obsolete) wrapper which did this by emulating Tclodbc.

TP also makes the point that, even with DBI, one must still exercise caution, that is, "write to a small, common core SQL [a]nd don't even think of writing portable advanced features like BLOBs and stored procedures."

Introduction

There seems to be a (not too wild) desire for a standardized database interface. I have seen many individual attempts on comp.lang.tcl

It seems to me that the initial difficulty lies in fixing the API; most of the existing tcl interfaces are sufficiently general to allow a very fast porting to any idiom we might think.

A standard db interface will allow the developpers of some db-specific tools (pgAccess, Wisql, ...), or some other interested person, to acquire a much larger audience for their tools. The rest of us benefit from the being the audience ...

So: let us start discussing the API! I hope it will not be too long a discussion: may we converge very fast

Miguel Sofer <[email protected]>, will sign as MS Dec 17, 1999

PS: I am rather a newcomer to both databases and tcl; my contribution will probably end up being rather slim.

Proposal by MS

MS:

At risk of offending everyone, here they go:

Let us look at what other people do: python [L1 ], JDBC, tclodbc [L2 ], AOLserver [L3 ], etc. I do think that adopting (and maybe expanding) the tclodbc API may have something going for it ...
I think an object oriented API is best, at least for the database connections: if the connection knows how to deal with the corresponding database, we avoid simultaneously: (i) restricting the API to a single database type/vendor per application, (ii) forcing the implementor of an interface to a new db type/vendor to modify previous code. The object structures are so simple that I propose avoiding [incrTcl] (or special objects, as in tclodbc) and sticking to pure tcl implementations using lists, arrays and the namespace facility for the object struture (as in [L4 ]).
We need at least three types of entities/objects:
- databases available: I propose an array structure, indexed by a name defined in the application; for each database there is a list {dbType userName password host {other dbType dependent attributes}} There are few functions related to these: defining, listing, deleting, connecting (and thus creating a dbConnection object)
- dbConnection: similar functionality to tclodbc database objects
- statement: to allow precompiled queries, as in tclodbc; also with similar functionality.

I will post a more definite proposal soon, maybe even this century. I am cutting my teeth and refining my ideas on a pure tcl wrapper around libpgtcl (PostgreSQL interface).

LV: you might take a look at what the mailing list that is discussing this topic has come up with to date - I haven't been following it, but suspect there might be something useful there, as I know that several database extension authors were on the list. part 2 of the comp.lang.tcl FAQ for a pointer.

MS:

I would be all for adopting the tclodbc API as standard! Actually, this is more or less what I was hinting at here.

Actually, IMHO, it is not so important WHAT the standard is, as long as there is one! It should be general enough not to impose too many constraints on the programmer - otherwise, nobody will use it. But many of the choices are a matter of convention only.

AFAICT, the "shortcomings" of this API are:

(1) It is currently NOT recognized as a standard! I'm not sure how this particular issue can be addressed. Does Scriptics have to declare it a standard? How does one go about "declaring" a standard?

(2) The error behaviour is not fully described in http://www.solagem.fi/~rnurmi/tclodbc.html . Should connection errors, database errors, SQL failures (eg, failed inserts due to primary key duplication), and others raise tcl errors, or special tcl exceptions, or return normally and set an error message somewhere? I may have missed it, but I did look for this specification and came back not fully satisfied ...

(3) It would be nice to define some "standard format" for some of the things that differ among db vendors, so that we have a uniform API and the interface "corrects" the format to communicate with the particular DBMS. The python API http://www.python.org/topics/database/DatabaseAPI-2.0.html defines for instance a standard interface to: date, time, timestamp, datetime, strings, numbers, rowid, BLOBS, ...

(4) It should be slightly extended so that actually tclodbc is ONE implementation of the API (for ODBC connections); but I am thinking of having "native" implementations for Oracle, Postgres, MySql, JDBC, ... This extension would concern mostly the "database configure" part, maybe also "database set/get" ...

(5) I would propose that the API at least complements the "dbName read", "stName read" and "stName fetch" methods with a way to return a list of lists (one list for each row returned): it is a similar functionality to "read" in terms of returning multiple rows, but uses lists rather than arrays - structures which seem to be better suited for at least some iterations, and more efficient in recent versions of tcl. Actually, my proposal is actually a little more general: the API should define a new "result" object, which is an array containing info on the result (column attributes, list of returned rows, number of returned rows, ...) so that the application can disconnect from the DBMS without losing the results of a query already performed. This would be an advantage for CGI-like applications that work through pooled connections: you can return the connection to the pool as soon as you are done with it, and then go on with computations that do not require it without denying access to the connection to other users ...

(6) The API does not define the namespaces in which the different entities live. Maybe we should have a tclDBC namespace that exports the proc database; new methods (dbName, stName, resName) are created in the namespace of the caller, everything else is created under ::tclDBC

(7) The tclodbc api silently destroys an object (connection or statement) if the user is unlucky/unclever enough to define a new one with the same name - this can be seen as a bug OR as a feature. Personally, I would prefer to have a format like

set db [database connect db employeebase sysadm xxxxx]
$db statement {select employee from employees}

The difference is: it is the API that manages the object names and insures that they are not duplicated; an object cannot be destroyed accidentally. Also, by defining fully qualified names for the different objects (eg, a connection ::tclDBC::pgsql32) it is certain that there is no collision with any other objects/procs that the user or the application may define - think of what happens if someone "accidentally" does for instance

  % database connect proc db employeebase sysadm xxxxx

I do not know if tclodbc would complain, or actually redefine proc! The other approach seems safer, and imposes fewer restrictions on the user in the choice of his object names.

Michael Cleverly's AolServer Wrappers

A set of Tcl wrappers around some of the various database extensions (oratcl, pgtclsh, etc.) to emulate the AOLserver database API is available at http://michael.cleverly.com/aolserver/nstcl -- adding support for a "new" database extension involves writing (usually) a mere six short Tcl functions.

DKF's Notes on the Matter

DKF: Picking out a set of notes I wrote on the matter in discussion with Clif Flynt which some people might find interesting...

MESSAGE EXTRACT STARTS

Still, here's my idea as best I remember it, and it should be noted that I'm not a deep DB person - I code to try to keep APIs clean and to make simple things (or at least things I believe to be simple) as easy as possible. I'm using an OO-ish style here

  # Not specifying how you get a handle, but it probably involves a
  # username and password somewhere.  I'd be happy for this to be
  # fairly driver-specific.  Maybe it'd look a bit like this:
  #   set handle [FooDB engineURL foobaruser LaMePaSsWoRd]
  # I know that Oracle lets you do various things with something
  # called cursors, but I don't know anything about that.  I'd assume
  # a common DB API would hide as much of that as possible, especially
  # since people can still use the DB-specific extensions instead.

  $handle close

  $handle transaction $body

  $handle eval $sql ?$substlist?

  $handle eval $sql ?$substlist? $bindings $body

  $handle info ...

  $handle resultinfo ...

The close subcommand does the obvious thing. All use of a particular handle after it is closed will cause an error except for termination of transactions, which do nothing (the close rolls back all open transactions, maybe?)

The transaction subcommand (where supportable, it must exist, and where not, it must raise an error if used - no silent failover of something this important) starts a transaction, with normal termination of the body script committing the transaction, and non-normal termination (I have not decided what a non-normal termination is. Certainly an error, but are break, continue and return non-normal? I can see reasonable arguments either way.) of the body rolling back the transaction. All execution of SQL outside a transaction should be auto-committed. If necessary it might be a good idea to have some way of being able to directly perform the transaction start, commit and rollback, but I'd much rather that API was hidden/private since it causes all sorts of problems. I don't know whether transactions should be nestable with sub-transactions, nestable by way of ignoring the contained transaction, or should cause an error; I can see the theoretical possibilities of all three...

The eval subcommand evaluates the given chunk of SQL (possibly with the substitutions from the substlist argument applied; any mapping to prepared statements either occurs out-of-sight of the script programmer, or possibly through a prepare subcommand, though I'd prefer to avoid introducing such if it can be guessed instead) and, if there are no bindings and body, returns a list of all the result rows matched (each row being a list of values, of course.) If the bindings and body are present, then the bindings are a list of names of variables that are set to the values associated with the row prior to execution of the body. [break] causes the remaining rows to be discarded, and [continue] causes the next row to be processed. I would certainly hope for evals to be nestable, if the underlying engine can support that sort of thing.

The info subcommand returns database and/or table-level metadata. It probably needs more subcommands, but that's getting into an area I don't know about.

Similarly, the resultinfo subcommand returns metadata about a particular piece of SQL's result (with the same caveats as for the info subcommand.) I envisage its values only being valid for the body of an eval, and in the immediate aftermath of a body-less eval (cached until the next eval or the next iteration round a containing eval body.)

There's probably a reasonable call for a configure/cget pair as well. I've no idea what is reasonably configurable. :^)

I see no reason for this code to be a thin veneer. IMO, it could be quite thick where the underlying DB is fairly low on capabilities.

MESSAGE EXTRACT ENDS

FURTHER MESSAGE EXTRACT STARTS

Areas where I reckon that much can be done to provide a truly uniform API include prepared statements, which I'd say more about if I had anything really useful to say beyond "I believe they are simulatable, and we don't need any externally-visible handles." On this last point, I've been thinking a bit more about this over lunch. We can use the statement itself as a key in an array to keep explicit handles out of the way, and it might even be possible to spot situations where a statement is likely to benefit from preparation automatically (e.g. we've seen the same statement before substitutions three times, and we're likely to see it more often, though I see no reason not to accept hints on this matter.)

Another place where I have a definite opinion is on the subject of the number of handles visible at the Tcl level. Which should be one. While I understand that some DBs (like Oracle) support both connection and cursor handles, I believe that most programmers do not need to know the difference, and indeed should not be presented with that implementation detail because it makes doing simple things much harder than it ought to be. People needing fancy control can use a database-specific API instead.

MESSAGE EXTRACT ENDS

I'll just add a note that the bodied-form of db-eval is there to support processing on a per-row basis, so that enormous amounts of data do not need to be stored in the client end, but that the other form is also likely to be useful for those cases where either the amount of data being generated is small or non-existent (e.g. in updates, table-creation, etc.) I'd assume that programmers would use these capabilities sensibly. :^)

OK, none of this is really targetted at making writing the API to any particular DB easy; instead they are focussed on making common DB operations easy to write in a portable fashion, and I'd be quite happy for the very sophisticated stuff to be left to DB-specific extensions. The principle of make the common easy, and the hard possible should be our guiding light here. I also suspect that too many people thinking about DB APIs are working from the perspective of trying to encode all existing functionality as quickly as possible, instead of as naturally as possible. If there's more than one kind of user-visible handle about, programmers (who mostly don't have that good a memory, if my personal experience is anything to go by :^) are going to get confused. We should try not to add that sort of complexity if it is not necessary, especially as I suspect that it is possible to do some very intelligent things behind-the-scenes...

libdbi

In August 2001, libdbi [L5 ] inspired an outburst of enthusiasm [L6 ] among Tcl-ers.

escargo 2008-03-3: The related project http://libdbi-drivers.sourceforge.net/ seems to be making some progress: "libdbi implements a database-independent abstraction layer in C, similar to the DBI/DBD layer in Perl. Writing one generic set of code, programmers can leverage the power of multiple databases and multiple simultaneous database connections by using this framework."

ODBC

How seriously have we considered declaration of ODBC as the Tcl DBI? TP has summarized in private conversation that "[o]n Windows, of course ODBC is the de facto standard" while "unixODBC is quite mature now", with even Sybase accessible through the FreeTDS project. Informix is the one exception; it apparently has only a (or a few?) commercial driver(s).

ODBC leaves out Metakit and other such specialties, of course. CL judges that "DBI" is no advantage for these, though.

Where would one find unixODBC?

Note, take a look at nstcl for an alternative.

Category Database