Tcl Tutorial Lesson Databases

Using databases

Databases and in particular relational databases are a popular means to store and retrieve data and many systems exist to use such databases. Within the context of Tcl one system is particularly convenient: SQLite, as it is well integrated into the language. In fact, SQLite has been described by its author, D. Richard Hipp, as a Tcl extension that escaped into the wild. Besides SQLite, Tcl also has well-developed interfaces to a number of other database systems and TDBC provides an open database connectivity system, so that all the details of the various specific interfaces are hidden.

Here we will look at the SQLite extension in particular, as it is bundled with Tcl distributions and it is seamlessly integrated with Tcl. For the more sophisticated uses of SQlite we refer to the SQLite site .

To start with, the program below uses a simple CSV file:

 station,date,salinity,temperature
NW1,2005-1-10,32.00,19.82
DO,2005-5-26,32.78,19.16
NW2,2005-11-3,33.33,18.50
DH,2005-4-2,33.45,15.49
NW1,2005-2-16,33.56,15.38
DO,2005-2-18,33.11,15.61
DO,2005-10-1,29.02,19.97
DH,2005-3-19,31.23,17.40
DO,2005-1-28,31.51,18.84
OS30,2005-9-23,30.63,16.04
DH,2005-8-21,29.27,16.54
NW2,2005-10-16,31.82,18.16
OS30,2005-6-4,31.21,15.97
DO,2005-8-1,29.89,17.77
...

It contains fictitious data for a small number of monitoring stations of salinity and temperature.

Creating a connection

The first step for using a database is to get a connection. The command [sqlite3] that is defined in the [sqlite3] package either opens an existing database file or creates a new one:

    package require sqlite3

    sqlite3 db ./monitoring.sqlite

Note: If you want to store the data in memory, specify the special file name :memory, otherwise a file on disk is used.

The above command sqlite3 creates a new command, db, that is used to interact with the database. When you are done with it, use:

    db close

This closes the file (or releases the memory) and the command db is deleted.

Running SQL commands

The database is initially empty, so we need to create a table first. To do so, we use the eval subcommand and an SQL command to create the table in question:

    db eval {CREATE TABLE mondata( station text, date int, salinity real, temperature real )}

As you can see the date is stored as an integer, it will be in particular the number of seconds as returned by the [clock scan] command. SQLite does not have a separate date data type, so date/time values are stored as seconds or as text strings (conforming to the ISO8601 standard) or as Julian day numbers (format %J in the [clock] command). Several builtin functions are available to deal with dates, but here we use the possiblities offered via the [clock scan] command.

The eval subcommand can now be used to insert records into the table:

    db eval {INSERT INTO mondata VALUES($station,$seconds,$salinity,$temperature)}

In this command you should note the direct use of variable names (or rather the direct use of the variables' values) - this is an example of the integration of SQlite into Tcl. There is no need to explicitly bind variables to column names, although that is also a possibility (see the SQLite documentation ).

Because inserting data into a database requires a lot of work to keep the database's integrity, repeatedly inserting records may be a slow process. This can be speeded up by using transactions:

    db eval {BEGIN TRANSACTION}
    ... insert records in a loop
    db eval {COMMIT TRANSACTION}

The individual SQL commands are run in batches, thereby reducing the amount of work associated with the integrity checks.

Putting it all together, here is code to create the table, read the CSV file and fill in the table, ready for further use:

    package require sqlite3

    sqlite3 db ./monitoring.sqlite

    db eval {CREATE TABLE mondata( station text, date int, salinity real, temperature real )}

    set infile [open "monitoring.csv"]

    gets $infile line ;# Skip the first line - we know what the columns mean

    db eval {BEGIN TRANSACTION}

    while { [gets $infile line] >= 0 } {
        lassign [split $line ,] station date salinity temperature
        set seconds [clock scan $date]
        db eval {INSERT INTO mondata VALUES($station,$seconds,$salinity,$temperature)}
    }

    db eval {COMMIT TRANSACTION}

Producing a report

The next step is to actually use the database, for instance to determine the mean salinity and temperature per station. SQLite provides such functions as avg() and max() for various statistical and other calculations: ?

    db eval {SELECT station, avg(salinity), avg(temperature) FROM mondata GROUP BY station} {
        puts "[format "%20s %6.2f %6.2f" $station $avg(salinity) $avg(temperature)]"
    }

In this case, the result of the SQL SELECT statement consists of several records and the script in the last argument is used to treat each record as if it were a foreach statement.

In this context the column names in the SELECT statement are implicitly turned into Tcl variables, so that the [puts] statement and [format] statement get the values of each record in turn.

An alternative form is to specify the name of a Tcl array as the second argument:

   db eval {SELECT station, avg(salinity), avg(temperature) FROM mondata GROUP BY station} values {
       puts "[format "%20s %6.2f %6.2f" $values(station) ${values(avg(salinity))} ${values(avg(temperature))}]"
   }

(Note the use of braces around the array elements, otherwise the parentheses are misunderstood). Which form to use will depend partly on taste but partly also on isolating the effect of the SELECT statement from the rest of the Tcl code - after all, column names become Tcl variables with the first form and that may bite existing variables.

With the CSV file provided with this lesson, the result is:

                  DH  31.39  17.52
                  DO  31.01  17.96
                 NW1  30.82  17.78
                 NW2  31.71  17.27
                OS30  30.47  17.58

Advanced usage

Now let's use a more advanced feature of the Tcl interface: additional SQL functions. The interface allows you to register Tcl procedures as functions that can applied within SQL statements:

proc seasonName {time} {
    set month [clock format $time -format "%m"]
    switch -- $month {
        01 - 02 - 12 {
           return "winter"
        }
        03 - 04 - 05 {
           return "spring"
        }
        06 - 07 - 08 {
           return "summer"
        }
        09 - 10 - 11 {
           return "autumn"
        }
    }
    return "unknown"
}

db function season seasonName

The function subcommand registers a Tcl procedure seasonName that will be known within an SQL statement as season.

To select all observations made in the summer period, this code suffices:

#
# Print all observations in summer
#
db eval {SELECT * FROM mondata where season(date) = 'summer' ORDER BY station} {
   puts "[format "%20s %s %6.2f %6.2f" $station [clock format $date -format "%Y-%m-%D"] $salinity $temperature]"
}

with the result:

                  DH 2005-08-08/21/2005  29.27  16.54
                  DH 2005-08-08/27/2005  33.83  17.50
                  DH 2005-07-07/22/2005  28.26  16.26
                  DH 2005-07-07/16/2005  28.67  15.50
                  DH 2005-08-08/25/2005  33.07  19.94
                  DO 2005-08-08/01/2005  29.89  17.77
                  DO 2005-06-06/10/2005  33.28  18.23
                  DO 2005-06-06/12/2005  29.26  19.48
                 NW1 2005-08-08/18/2005  30.87  18.73
                 NW1 2005-08-08/23/2005  28.93  17.99
                 NW1 2005-08-08/13/2005  30.75  19.56
                 NW1 2005-06-06/24/2005  28.72  15.24
                 NW2 2005-08-08/02/2005  31.14  19.05
                 NW2 2005-06-06/04/2005  30.45  15.74
                 NW2 2005-08-08/24/2005  31.34  18.47
                 NW2 2005-08-08/25/2005  33.59  15.50
                OS30 2005-06-06/04/2005  31.21  15.97
                OS30 2005-07-07/16/2005  30.22  17.44
                OS30 2005-08-08/16/2005  29.89  19.35
                OS30 2005-08-08/17/2005  31.86  16.75
                OS30 2005-08-08/18/2005  28.47  19.32

See the SQLite documentation and in particular the documentation of the Tcl interface for more information.

Remark

Actually the `function subcommand registers a command prefix, as the arguments in the SQL statement will be appended and then the resulting command is executed. This enables you to do:

db function hex {format 0x%X}

for turning a number into a hexadecimal string without defining a separate procedure for this.

This technique is used in several other places in Tcl as well, for instance, with variable and command traces