Tcl Tutorial Lesson Databases

Difference between version 4 and 5 - Previous - Next
**Using databases**


!!!!!!
'''[Tcl Tutorial Lesson 41%|%Previous lesson%|%]''' | '''[Tcl Tutorial Index%|%Index%|%]''' | '''[Tcl Tutorial Lesson 27%|%Next lesson%|%]'''
!!!!!!


Databases and in particular relational databases are a popular means to store and retrieve data and many systems exist to
use such databases. Within the context of Tcl one system is particularly convenient: [SQLite], as it is well integrated into the
language. In fact, SQLite has been described by its author, D. Richard Hipp, as a Tcl extension that escaped into the wild. Besides
SQLite, Tcl also has well-developed interfaces to a number of other database systems and [TDBC] provides an open database connectivity
system, so that all the details of the various specific interfaces are hidden.

Here we will look at the [SQLite] extension in particular, as it is bundled with Tcl distributions and it is seamlessly integrated
with Tcl. For the more sophisticated uses of [SQlite] we refer to the [http://www.sqlite.org%|%SQLite site%|%].

To start with, the program below uses a [Data for Tcl-SQlite tutorial%|%simple CSV file%|%]:
======
 station,date,salinity,temperature
NW1,2005-1-10,32.00,19.82
DO,2005-5-26,32.78,19.16
NW2,2005-11-3,33.33,18.50
DH,2005-4-2,33.45,15.49
NW1,2005-2-16,33.56,15.38
DO,2005-2-18,33.11,15.61
DO,2005-10-1,29.02,19.97
DH,2005-3-19,31.23,17.40
DO,2005-1-28,31.51,18.84
OS30,2005-9-23,30.63,16.04
DH,2005-8-21,29.27,16.54
NW2,2005-10-16,31.82,18.16
OS30,2005-6-4,31.21,15.97
DO,2005-8-1,29.89,17.77
...
======

It contains ''fictitious'' data for a small number of monitoring stations of salinity and temperature.

***Creating a connection***

The first step for using a database is to get a connection. The command [[sqlite3]] that is defined in the [[sqlite3]] package
either opens an existing database file or creates a new one:
======
    package require sqlite3

    sqlite3 db ./monitoring.sqlite
======

''Note:'' If you want to store the data in memory, specify the special file name `:memory`, otherwise a file on disk is used.

The above command `sqlite3` creates a new command, `db`, that is used to interact with the database. When you are done with it,
use:
======
    db close
======
This closes the file (or releases the memory) and the command `db` is deleted.


***Running SQL commands***

The database is initially empty, so we need to create a table first. To do so, we use the `eval` subcommand and an SQL command
to create the table in question:

======
    db eval {CREATE TABLE mondata( station text, date int, salinity real, temperature real )}
======

As you can see the ''date'' is stored as an integer, it will be in particular the number of seconds as returned by the [[clock scan]]
command. [SQLite] does not have a separate ''date'' data type, so date/time values are stored as seconds or as text strings
(conforming to the ISO8601 standard) or as Julian day numbers (format %J in the [[clock]] command). Several builtin functions are
available to deal with dates, but here we use the possiblities offered via the [[clock scan]] command.

The `eval` subcommand can now be used to insert records into the table:

======
    db eval {INSERT INTO mondata VALUES($station,$seconds,$salinity,$temperature)}
======

In this command you should note the direct use of variable names (or rather the direct use of the variables' values) - this
is an example of the integration of [SQlite] into Tcl. There is no need to explicitly ''bind'' variables to column names, although
that is also a possibility (see the [http://www.sqlite.org%|%SQLite documentation%|%]).

Because inserting data into a database requires a lot of work to keep the database's integrity, repeatedly inserting records
may be a slow process. This can be speeded up by using ''transactions'':

======
    db eval {BEGIN TRANSACTION}
    ... insert records in a loop
    db eval {COMMIT TRANSACTION}
======

The individual SQL commands are run in batches, thereby reducing the amount of work associated with the integrity checks.

Putting it all together, here is code to create the table, read the CSV file and fill in the table, ready for further use:

======
    package require sqlite3

    sqlite3 db ./monitoring.sqlite

    db eval {CREATE TABLE mondata( station text, date int, salinity real, temperature real )}

    set infile [open "monitoring.csv"]

    gets $infile line ;# Skip the first line - we know what the columns mean

    db eval {BEGIN TRANSACTION}

    while { [gets $infile line] >= 0 } {
        lassign [split $line ,] station date salinity temperature
        set seconds [clock scan $date]
        db eval {INSERT INTO mondata VALUES($station,$seconds,$salinity,$temperature)}
    }

    db eval {COMMIT TRANSACTION}
======

***Producing a report***

The next step is to actually use the database, for instance to determine the mean salinity and temperature per station.
[SQLite] provides such functions as `avg()` and `max()` for various statistical and other calculations:
?
======
    db eval {SELECT station, avg(salinity), avg(temperature) FROM mondata GROUP BY station} {
        puts "[format "%20s %6.2f %6.2f" $station $avg(salinity) $avg(temperature)]"
    }
======

In this case, the result of the SQL `SELECT` statement consists of several records and the script in the last argument
is used to treat each record as if it were a `foreach` statement.

In this context the ''column names'' in the `SELECT` statement are implicitly turned into Tcl variables, so that the
[[puts]] statement and [[format]] statement get the values of each record in turn.

An alternative form is to specify the name of a Tcl array as the second argument:

======
   db eval {SELECT station, avg(salinity), avg(temperature) FROM mondata GROUP BY station} values {
       puts "[format "%20s %6.2f %6.2f" $values(station) ${values(avg(salinity))} ${values(avg(temperature))}]"
   }
======

(Note the use of braces around the array elements, otherwise the parentheses are misunderstood). Which form to use
will depend partly on taste but partly also on isolating the effect of the `SELECT` statement from the rest of the
Tcl code - after all, column names become Tcl variables with the first form and that may bite existing variables.

With the [Data for Tcl-SQlite tutorial%|%CSV file%|%] provided with this lesson, the result is:

======
                  DH  31.39  17.52
                  DO  31.01  17.96
                 NW1  30.82  17.78
                 NW2  31.71  17.27
                OS30  30.47  17.58
======

2023-02-16: Parsing the CSV file worked slowly for me even with transactions. Is there some syntax to tell SQLite to import a CSV file directly without parsing it and using INSERTs? I searched and found that for the command line but not for Tcl code.
See: the Tclsqlite "copy" command:
https://www.sqlite.org/tclsqlite.html#copy

***Advanced usage***

Now let's use a more advanced feature of the Tcl interface: additional SQL functions. The interface allows you to register
Tcl procedures as functions that can applied within SQL statements:

======
proc seasonName {time} {
    set month [clock format $time -format "%m"]
    switch -- $month {
        01 - 02 - 12 {
           return "winter"
        }
        03 - 04 - 05 {
           return "spring"
        }
        06 - 07 - 08 {
           return "summer"
        }
        09 - 10 - 11 {
           return "autumn"
        }
    }
    return "unknown"
}

db function season seasonName
======

The `function` subcommand registers a Tcl procedure `seasonName` that will be known within an SQL statement as `season`.

To select all observations made in the summer period, this code suffices:

======
#
# Print all observations in summer
#
db eval {SELECT * FROM mondata where season(date) = 'summer' ORDER BY station} {
   puts "[format "%20s %s %6.2f %6.2f" $station [clock format $date -format "%Y-%m-%D"] $salinity $temperature]"
}
======

with the result:

======
                  DH 2005-08-08/21/2005  29.27  16.54
                  DH 2005-08-08/27/2005  33.83  17.50
                  DH 2005-07-07/22/2005  28.26  16.26
                  DH 2005-07-07/16/2005  28.67  15.50
                  DH 2005-08-08/25/2005  33.07  19.94
                  DO 2005-08-08/01/2005  29.89  17.77
                  DO 2005-06-06/10/2005  33.28  18.23
                  DO 2005-06-06/12/2005  29.26  19.48
                 NW1 2005-08-08/18/2005  30.87  18.73
                 NW1 2005-08-08/23/2005  28.93  17.99
                 NW1 2005-08-08/13/2005  30.75  19.56
                 NW1 2005-06-06/24/2005  28.72  15.24
                 NW2 2005-08-08/02/2005  31.14  19.05
                 NW2 2005-06-06/04/2005  30.45  15.74
                 NW2 2005-08-08/24/2005  31.34  18.47
                 NW2 2005-08-08/25/2005  33.59  15.50
                OS30 2005-06-06/04/2005  31.21  15.97
                OS30 2005-07-07/16/2005  30.22  17.44
                OS30 2005-08-08/16/2005  29.89  19.35
                OS30 2005-08-08/17/2005  31.86  16.75
                OS30 2005-08-08/18/2005  28.47  19.32
======

See the [https://www.sqlite.org%|%SQLite documentation%|%] and in particular
[https://www.sqlite.org/tclsqlite.html%|%the documentation of the Tcl interface%|%] for more information.


***Remark***
Actually the `function subcommand registers a ''command prefix'', as the arguments in the SQL statement
will be appended and then the resulting command is executed.  This enables you to do:

======
db function hex {format 0x%X}
======

for turning a number into a hexadecimal string without defining a separate procedure for this.

This technique is used in several other places in Tcl as well, for instance, [Tcl Tutorial Lesson 37%|%with variable and command traces%|%]

!!!!!!
'''[Tcl Tutorial Lesson 41%|%Previous lesson%|%]''' | '''[Tcl Tutorial Index%|%Index%|%]''' | '''[Tcl Tutorial Lesson 27%|%Next lesson%|%]'''
!!!!!!