if 0 {
Creating an Invalid Tcl_Obj via SQLite explores the interactions of SQLite and Tcl
According to SQLite documentation, a blob that is cast to text is "interpreted" as text encoded in the database encoding. What that means is that SQLite assumes that the blob value is properly-encoded, and assigns the type text to it. Because it doesn't verify that the value is properly encoded, it is possible store into the database a value of type "text" invalid, I.e. that is not properly encoded. Later on, SQLite may create an invalid Tcl_Obj by storing the invalid text into "bytes" field of a Tcl_Obj. The "bytes" field is supposed to store a string encoded in modified utf-8. If the data in "bytes" is not modified utf-8, various Tcl routines fail in a variety of ways. The following example illustrates some of that behaviour.
First, some setup:
} #! /usr/bin/env tclsh package require sqlite3 sqlite3 db :memory: db eval { create table t1 ( vals ) } variable data1 bb if 0 {
In a Tcl ByteArray, each byte represents a single Unicode character, so $data1b is the single character \ubb
} variable data1b [binary format H* $data1] if 0 {
If a value has a BinaryArray internal representation, but no string representation, SQLite uses the bytes in the array as a blob. In the current case, $data1b has no string representation so SQLite treats it as blob, making this cast a no-op.
} set castasblob1 [db eval {select cast($data1b as blob)}] if 0 {
When $data1b is cast as text, SQLite makes no effort to verify that the $data1b is validly-encoded according to the current database encoding. It simply changes the type to text . In the current case, a single byte with the value 0xbb is stored as a(n invalid) text value.
} db eval {insert into t1 values (cast($data1b as text))} if 0 {
Tcl requires the "bytes" field of a value to be valid utf-8, but here sqlite stores the single byte 0xbb to "bytes" field of $data2. This is not valid utf-8, but in subsequent operations Tcl routines assume that it is.
} set data2 [db onecolumn {select vals from t1}] if 0 {
The new value does not compare as equal to the old value because the "bytes" value of \ubb is the valid utf-8 sequence \xc2\xbb but the "bytes" value that SQLite created is \xbb:
} puts [list {strings equal?} [expr {$data2 eq $data1b}]] if 0 {
To further illustrate, the first character of the two strings does not match:
} set char1 [string index $data1b 0] set char2 [string index $data2 0] puts [list {first characters equal?} [expr {$char1 eq $char2}]] if 0 {
In the next example, Tcl happens to succesfully convert 0xbb back to the original character here because even though 0xbb isn't valid utf-8, the utf-8 decoder happens to be implemented such that it falls back to interpreting 0xbb as a single-byte value that is a unicode code point. The "utf-8-decoded" character is now equal to the original character:
} set char2u [encoding convertfrom utf-8 $char2] puts [list {utf-8 decoded character equal?} [expr {$char1 eq $char2u}]] if 0 {
Convert \ubb to its unicode code point:
} scan $data1b %c ord2 if 0 {
scan also happens to convert 0xba to \ubb, so the ordinal values of the two characters are equal:
} scan $data2 %c ord1 puts [list {ordinals are equal?} [expr {$ord1 eq $ord2}]] if 0 {
An attempt to print the invalid value results in the error:
"error writing stdout: invalid argument"
} catch { puts $data2 } cres puts [list {error printing value returned by sqlite:} $cres] if 0 {
$castasblob1, which was created earlier, is equal to $data1b
} puts [list {earlier result of casting as blob equals original?} [ expr {$data1b eq $castasblob1}]] if 0 {
A side effect of previous operations is that a string representation has been generated for $data1b, so sqlite no longer treats it as a blob, and casting to a blob is no longer a no-op. Now, casting $data1b as a blob produces the utf-8 encoding \ubb.
} set data3 [db eval {select cast($data1b as blob)}] puts [list {result of casting as blob equals original?} [expr {$data1b eq $data3}]] set data1b_utf [encoding convertto utf-8 $data1b] puts [list {instaed, result of casting is the utf-8 encoding of the original value?} [ expr {$data1b_utf eq $data3}]] if 0 {
To summarize the findings:
If a blob that is not valid utf is cast to text and then stored, the database now contains text data that is not in the database encoding. SQLite uses Tcl_NewStringObj to store this data into the "bytes" field of a Tcl_Obj, breaking the rule that the "bytes" field contains a utf-8 string.
One could say that the client got what the client requested, but it is odd that a Tcl script can produce an invalid Tcl_Obj.
If, SQLite performed conversion during the cast, the way it does with numeric values, instead of assuming that the blob bytes are properly-formatted, it should convert them to text using the database encoding. Since every single byte has a valid utf encoding, this conversion would never fail.
}