**Using databases** !!!!!! '''[Tcl Tutorial Lesson 41%|%Previous lesson%|%]''' | '''[Tcl Tutorial Index%|%Index%|%]''' | '''[Tcl Tutorial Lesson 27%|%Next lesson%|%]''' !!!!!! Databases and in particular relational databases are a popular means to store and retrieve data and many systems exist to use such databases. Within the context of Tcl one system is particularly convenient: [SQLite], as it is well integrated into the language. In fact, SQLite has been described by its author, D. Richard Hipp, as a Tcl extension that escaped into the wild. Besides SQLite, Tcl also has well-developed interfaces to a number of other database systems and [TDBC] provides an open database connectivity system, so that all the details of the various specific interfaces are hidden. Here we will look at the [SQLite] extension in particular, as it is bundled with Tcl distributions and it is seamlessly integrated with Tcl. For the more sophisticated uses of [SQlite] we refer to the [http://www.sqlite.org%|%SQLite site%|%]. To start with, the program below uses a [Data for Tcl-SQlite tutorial%|%simple CSV file%|%]: ====== station,date,salinity,temperature NW1,2005-1-10,32.00,19.82 DO,2005-5-26,32.78,19.16 NW2,2005-11-3,33.33,18.50 DH,2005-4-2,33.45,15.49 NW1,2005-2-16,33.56,15.38 DO,2005-2-18,33.11,15.61 DO,2005-10-1,29.02,19.97 DH,2005-3-19,31.23,17.40 DO,2005-1-28,31.51,18.84 OS30,2005-9-23,30.63,16.04 DH,2005-8-21,29.27,16.54 NW2,2005-10-16,31.82,18.16 OS30,2005-6-4,31.21,15.97 DO,2005-8-1,29.89,17.77 ... ====== It contains ''fictitious'' data for a small number of monitoring stations of salinity and temperature. ***Creating a connection*** The first step for using a database is to get a connection. The command [[sqlite3]] that is defined in the [[sqlite3]] package either opens an existing database file or creates a new one: ====== package require sqlite3 sqlite3 db sqlite3 db ./monitoring.sqlite ====== ''Note:'' If you want to store the data in memory, specify the special file name `:memory`, otherwise a file on disk is used. The above command `sqlite3` creates a new command, `db`, that is used to interact with the database. When you are done with it, use: ====== db close ====== This closes the file (or releases the memory) and the command `db` is deleted. ***Running SQL commands*** The database is initially empty, so we need to create a table first. To do so, we use the `eval` subcommand and an SQL command to create the table in question: ====== db eval {CREATE TABLE mondata( station text, date int, salinity real, temperature real )} ====== As you can see the ''date'' is stored as an integer, it will be in particular the number of seconds as returned by the [[clock scan]] command. [SQLite] does not have a separate ''date'' data type, so date/time values are stored as seconds or as text strings (conforming to the ISO8601 standard) or as Julian day numbers (format %J in the [[clock]] command). Several builtin functions are available to deal with dates, but here we use the possiblities offered via the [[clock scan]] command. The `eval` subcommand can now be used to insert records into the table: ====== db eval {INSERT INTO mondata VALUES($station,$seconds,$salinity,$temperature)} ====== In this command you should note the direct use of variable names (or rather the direct use of the variables' values) - this is an example of the integration of [SQlite] into Tcl. There is no need to explicitly ''bind'' variables to column names, although that is also a possibility (see the [http://www.sqlite.org%|%SQLite documentation%|%]). Because inserting data into a database requires a lot of work to keep the database's integrity, repeatedly inserting records may be a slow process. This can be speeded up by using ''transactions'': ====== db eval {BEGIN TRANSACTION} ... insert records in a loop db eval {COMMIT TRANSACTION} ====== The individual SQL commands are run in batches, thereby reducing the amount of work associated with the integrity checks. Putting it all together, here is code to create the table, read the CSV file and fill in the table, ready for further use: ====== package require sqlite3 sqlite3 db ./monitoring.sqlite db eval {CREATE TABLE mondata( station text, date int, salinity real, temperature real )} set infile [open "monitoring.csv"] gets $infile line ;# Skip the first line - we know what the columns mean db eval {BEGIN TRANSACTION} while { [gets $infile line] >= 0 } { lassign [split $line ,] station date salinity temperature set seconds [clock scan $date] db eval {INSERT INTO mondata VALUES($station,$seconds,$salinity,$temperature)} } db eval {COMMIT TRANSACTION} ====== ***Producing a report*** The next step is to actually use the database, for instance to determine the mean salinity and temperature per station. [SQLite] provides such functions as `avg()` and `max()` for various statistical and other calculations: ? ====== db eval {SELECT station, avg(salinity), avg(temperature) FROM mondata GROUP BY station} { puts "[format "%20s %6.2f %6.2f" $station $avg(salinity) $avg(temperature)]" } ====== In this case, the result of the SQL `SELECT` statement consists of several records and the script in the last argument is used to treat each record as if it were a `foreach` statement. In this context the ''column names'' in the `SELECT` statement are implicitly turned into Tcl variables, so that the [[puts]] statement and [[format]] statement get the values of each record in turn. An alternative form is to specify the name of a Tcl array as the second argument: ====== db eval {SELECT station, avg(salinity), avg(temperature) FROM mondata GROUP BY station} values { puts "[format "%20s %6.2f %6.2f" $values(station) ${values(avg(salinity))} ${values(avg(temperature))}]" } ====== (Note the use of braces around the array elements, otherwise the parentheses are misunderstood). Which form to use will depend partly on taste but partly also on isolating the effect of the `SELECT` statement from the rest of the Tcl code - after all, column names become Tcl variables with the first form and that may bite existing variables. With the [Data for Tcl-SQlite tutorial%|%CSV file%|%] provided with this lesson, the result is: ====== DH 31.39 17.52 DO 31.01 17.96 NW1 30.82 17.78 NW2 31.71 17.27 OS30 30.47 17.58 ====== ***Advanced usage*** Now let's use a more advanced feature of the Tcl interface: additional SQL functions. The interface allows you to register Tcl procedures as functions that can applied within SQL statements: ====== proc seasonName {time} { set month [clock format $time -format "%m"] switch -- $month { 01 - 02 - 12 { return "winter" } 03 - 04 - 05 { return "spring" } 06 - 07 - 08 { return "summer" } 09 - 10 - 11 { return "autumn" } } return "unknown" } db function season seasonName ====== The `function` subcommand registers a Tcl procedure `seasonName` that will be known within an SQL statement as `season`. To select all observations made in the summer period, this code suffices: ====== # # Print all observations in summer # db eval {SELECT * FROM mondata where season(date) = 'summer' ORDER BY station} { puts "[format "%20s %s %6.2f %6.2f" $station [clock format $date -format "%Y-%m-%D"] $salinity $temperature]" } ====== with the result: ====== DH 2005-08-08/21/2005 29.27 16.54 DH 2005-08-08/27/2005 33.83 17.50 DH 2005-07-07/22/2005 28.26 16.26 DH 2005-07-07/16/2005 28.67 15.50 DH 2005-08-08/25/2005 33.07 19.94 DO 2005-08-08/01/2005 29.89 17.77 DO 2005-06-06/10/2005 33.28 18.23 DO 2005-06-06/12/2005 29.26 19.48 NW1 2005-08-08/18/2005 30.87 18.73 NW1 2005-08-08/23/2005 28.93 17.99 NW1 2005-08-08/13/2005 30.75 19.56 NW1 2005-06-06/24/2005 28.72 15.24 NW2 2005-08-08/02/2005 31.14 19.05 NW2 2005-06-06/04/2005 30.45 15.74 NW2 2005-08-08/24/2005 31.34 18.47 NW2 2005-08-08/25/2005 33.59 15.50 OS30 2005-06-06/04/2005 31.21 15.97 OS30 2005-07-07/16/2005 30.22 17.44 OS30 2005-08-08/16/2005 29.89 19.35 OS30 2005-08-08/17/2005 31.86 16.75 OS30 2005-08-08/18/2005 28.47 19.32 ====== See the [https://www.sqlite.org%|%SQLite documentation%|%] and in particular [https://www.sqlite.org/tclsqlite.html%|%the documentation of the Tcl interface%|%] for more information. ***Remark*** Actually the `function subcommand registers a ''command prefix'', as the arguments in the SQL statement will be appended and then the resulting command is executed. This enables you to do: ====== db function hex {format 0x%X} ====== for turning a number into a hexadecimal string without defining a separate procedure for this. This technique is used in several other places in Tcl as well, for instance, [Tcl Tutorial Lesson 37%|%with variable and command traces%|%] !!!!!! '''[Tcl Tutorial Lesson 41%|%Previous lesson%|%]''' | '''[Tcl Tutorial Index%|%Index%|%]''' | '''[Tcl Tutorial Lesson 27%|%Next lesson%|%]''' !!!!!!