Suppose you have a huge flat text file, but the content actually comes in blocks. For example this could be a berkeley mbox, where some "From " pattern delimits the individual mails in the file. Now you want to create an index to this file, telling where each block starts and how big it is, and then later be able to read in only a specified block. At first sight this is an easy problem: First you create the index and then you do seek-and-read. However if the file is multi-byte encoded you run into some more serious problems due to the fact that [[seek]] measures in bytes whereas [[read]] measures in chars. (You may also run into trouble just in the trivial situation where the file has DOS line endings.) For example, to create the index, just read in the entire file (using fconfigure appropriately) and find what you want with some [[regexp -indices]]. But these indices won't help you to seek, because [[seek]] needs a pure byte measurement... Bad luck. (Of course your difficulty in producing a byte measurement is this: you would have to read the entire text chunk up to each index to count how many bytes each char occupies. The expense of doing this is of course also the reason why [[seek]] cannot measure in chars... Alright, then let us do everything in bytes. To set up the index in bytes you can perhaps get away with using TclX [[scanmatch]]es, and get the byte offsets from $matchIndex(offset). Very well, now we've got the indices in bytes, and we can seek. But now we cannot easily read! In order to read a specified number of bytes, a trick is needed: we set up a translation pipe using the TclX command [[pipe]]. Now first read from the file in binary mode, then put it into the pipe in raw mode, and finally pick it out of the pipe with the appropriate encoding. E.g. package require Tclx set fd [open $bigFlatFile r] # We know this file is utf-8 encoded, but we want to read a # certain number of bytes, not chars... fconfigure $fd -encoding binary pipe out in fconfigure $in -encoding binary -blocking 0 -buffering none fconfigure $out -encoding utf-8 -blocking 0 -buffering none seek $fd $offset puts $in [read $fd $numBytes] read -nonewline $out close $fd close $in close $out Although this is doable and works well, I think we are witnessing a shortcoming in Tcl: it would be really nice (and quite logical, in view of the functionality provided by [[seek]] and [[tell]]) if [[read]] could accept a -bytes flag. The only thing needed is a convention about how to handle the situation where the number of bytes does not constitute a complete char. One convention could be: finish the char in that case. Another convention: discard the non-complete char. Or finally, just leave the fractional char as binary debris --- it is up to the caller to make sure this does not happen, and in the examples like the above this comes about naturally. Here is how I think it should work: # This proc is supposed to work just like [read $fileHandle $numChars], # except that the size of the chunk to read is specified in bytes, not in # chars. This is useful in connection with [seek] and [tell] which always # measure in bytes. package require Tclx proc readBytes { fileHandle numBytes } { # Record the original configuration: set fconf [fconfigure $fileHandle] # If anything goes wrong, at least we should restore the # original configuration, hence the catch: if { [catch { # Configure for binary read: fconfigure $fileHandle -encoding binary -translation binary # Set up a translation pipe: pipe out in # The input end should be as neutral as possible: fconfigure $in -encoding binary -translation binary -buffering none # The output part should mimic the original configuration... eval fconfigure {$out} $fconf # (Beware that it is up to the caller to ensure that the original # channel is nonblocking if this is required.) # Now read the bytes and put them into the translation pipe: puts -nonewline $in [read $fileHandle $numBytes] set content [read $out] # Clean up: close $in close $out # And restore the original configuration: eval fconfigure {$fileHandle} $fconf } err] } { eval fconfigure {$fileHandle} $fconf error $err } else { return $content } } Unfortunately, on big chunks of text (>8192), there seems to be a [bug in pipe] that obstructs this solution... In fact: causes a crash...