SQLite Optimization

There is an excellent FAQ [L1 ] for tweaking your use of an SQLite database for optimal performance. Almost all the suggestions in the FAQ are applicable to code using the Tcl binding, but they are not actually expressed in Tcl code snippets. Hence this page, which attempts to do so.


In "Section 5: Optimizing Queries" the suggestion is made to replace GLOB and LIKE where possible. The example given in the FAQ is "The expression (x GLOB 'abc*') can be replaced by (x >= 'abc' AND x < 'abd')." In other words, if you have an INDEX on col_name, and you want to match the first few characters of the values in col_name, replace this:

 proc search_partial_match_slowly {str} {
     append str %
     dbcmd eval {SELECT * FROM table_name WHERE col_name LIKE $str} result {
         parray result
     }
 }

with this:

 proc search_partial_match_quickly {str} {
     set sub_str [string range $str 0 end-1]
     set last_char [string index $str end]
     set end $sub_str[format %c [expr {1+[scan $last_char %c]}]]
     dbcmd eval {SELECT * FROM table_name WHERE (col_name >= $str AND col_name < $end)} result {
         parray result
     }
 }

for greatly increased speed. (Time taken was reduced from much longer than one minute to much less than one second, in my case.) Alastair Davies 14 December 2006


SQLite does the above optimization automatically. See http://www.sqlite.org/optoverview.html#like_opt But the optimization is not valid for LIKE since LIKE is case insensitive by default. But if you turn on case sensitivity for LIKE, or if you make the collating sequence of the column case insensitive, or if you use GLOB instead of LIKE (since GLOB is case sensitive) then the optimization should work for you.

AMG: Regarding LIKE optimization: Recently I needed to do many, many case insensitive lookups on a symbol table. LIKE worked but was uncomfortably slow, so I just added a new table mapping between commons and their lowercased names, and I created an index on the lowercase column. This proved to be a gigantic performance win. (As you may have guessed, this symbol table includes a lot of FORTRAN symbols grouped into common blocks.)

LIKE gotcha: I also used LIKE to find all commons whose names didn't start with @, but when I changed the leading character to _, the search broke. This is because LIKE interprets _ as a metacharacter (any one character), same as how % means any number of characters. Switching to "glob _*" was the solution.

AMG: Though of course I could have also used COLLATE NOCASE... not sure if that feature existed back in 2006 when I did this code.

DKF: Computing a collation key remains a valuable approach, especially where you need something other than simple case mapping. For example, if you know you're going to be using soundex matching against the entries, you'll want to pre-compute that.


pcam Has anyone experienced the cache method and managed to have some prepared query ? Does it improve performance ? If so please include a snippet.