[Richard Suchenwirth] 2004-07-07 - ''Digging in old intranet pages, I found this plaything (but in fact in ran in one of our products for a while) written almost seven years ago - quite a newbie I was then:)'' In order to collect experience with query/retrieval problems, I have written a short Tcl script ... that simulates an [SQL] database using an associative Tcl array with the following features: * data organization is in entries, all on one level, internally marked by an unambiguous number, which have a varying number of fields with string value (varying length, no limit); * for each field name/value combination, an index to the corresponding entry numbers is held; * queries use the SELECT clause emulated with sugared Tcl. The CREATE, DROP, and DELETE commands were also partly emulated. In real SQL, keywords may be upper- or lowercase. In subSQL, most must be uppercase, as used in most SQL documentation. Syntax of subSQL's SELECT is a subset of SQL's: SELECT fields FROM table [WHERE cond [AND|OR cond]* ] ''fields'' can be a list of field (column) names, separated by commas but not whitespaces, or "*" for all fields in historical order. ''table'' is the name of the Tcl array. ''cond'' is of type column OPERATOR value, where OPERATOR is currently implemented for = and LIKE (both doing the same thing). For both, value may contain ANSI-SQL wildcards (_ for ".", % for ".*"), MS-Access wildcards (? for ".", * for ".*") and "superSQL" bracketed alternative lists like [[Oo0Q]], which is not available in ANSI or MS-Access. Even better, wildcarding works also for the column name, so subSQL allows wild queries like subSQL> SELECT * FROM pb WHERE *=*OMA* AND *=*40* AND *=*AR* {FNM=MARK LNM=ROMANO MST=1400} which yielded a unique result for very few "unspecified recognition results", but took 2.1 sec on sux000. A query with two ANDs took between 0.7 and 0.85 sec on sux000 from a database of 3007 entries loaded in 34..47 seconds. Of course, subSQL is no real challenger for all other alternatives mentioned above, but it allows the quick adaptability and testability that only some pages of Tcl code can bring. ---- #!/tools/bin/tclsh # sqltry -- simulated sql parsing in Tcl # R. Suchenwirth 97-07-14 set db(fields) {} set db(records) {} proc let {_var = expression } { upvar $_var var set var [expr $expression] } #---------------------------------------------------------------------- proc SELECT {fields FROM _db args} { upvar $_db db regsub -all "," $fields " " fields regsub -all " LIKE " $args "=" args set hits {}; set hitsl {}; set ress {} if {[lindex $args 0]=="WHERE"} { foreach a [lrange $args 1 end] { if {$a=="OR"} continue if {$a=="AND"} { lappend hitsl $hits; set hits {} } elseif {![regexp "=" $a]} { error "WHERE clause must be column=value" } regsub -all "_" $a "?" a regsub -all "%" $a "*" a regsub "=" $a "," index foreach i [array names db $index] { append hits " $db($i)" } } lappend hitsl $hits set hits [eval land $hitsl] } else { set hits $db(records) } if {$fields=="*"} {set fields $db(fields)} foreach hit $hits { set res {} foreach field $fields { if {![catch {set db($hit,$field)} t]} { lappend res "$field=$t" } } if {$res != {}} { lappend ress $res } } return $ress } #------------------------------------------list unique, remove duplicates proc luniq {list} { set t [lsort $list] set previous [set res [lindex $t 0]] foreach i [lrange $t 1 end] { if {$i != $previous} { lappend res $i set previous $i } } return $res } #------------------------------------------------- list non-unique proc lnuniq {list} { set t [lsort $list] set res {} set previous "some dummy value which is not expected to occur" foreach i $t { if {$i != $previous} { set previous $i } else { lappend res $i } } return $res } #------------------------------------------------------- list AND proc land {arg args} { set t [luniq $arg] foreach i $args { set t [lnuniq [concat $t [luniq $i]]] } return $t } #---------------------------------------------------------------------- proc data {_db args} { upvar $_db db let n = [llength $db(records)]+1 lappend db(records) $n foreach i $args { set type [lindex $i 0] set value [lindex $i 1] set db($n,$type) $value lappend db($type,$value) $n if {[lsearch -exact $db(fields) $type]==-1} { lappend db(fields) $type } } } #---------------------------------------------------------------------- data db {fnm William} {lnm Jones} {mst 2641} data db {fnm Mary} {mnm Lou} {lnm Smith} {mst 2641} {fct Assistant} data db {fnm William} {lnm Brown} {mst 2642} data db {fnm Martin} {lnm Marietta} {mst 2642} {fct Engineer} data db {lnm Suharto} {mst 2643} {fct Manager} data db {fnm Udo} {lnm Miletzki} {mst PE32} {fct Teamleader} data db {fnm Richard} {lnm Suchenwirth} {mst PE32} {fct Engineer} #---------------------------------------------------------------------- puts "subSQL: a small but growing subset in Tcl" set try "SELECT fnm,mnm,lnm,mst,fct FROM db WHERE mst=2641 OR mst=PE32" puts "Example: $try" eval $try while {![eof stdin]} { puts -nonewline "subSQL> " gets stdin s puts [eval $s] } ---- [Another simple database] is in the same spirit, just not mimicking [SQL] so closely, but incomparably simpler and more memory-efficient. ---- A example of this code tweeked to query CSV files can be found here [CSV Sql] ---- !!!!!! [Arts and crafts of Tcl-Tk programming] %| [Category Database] |% !!!!!!