Bit manipulations

See Also

I've come across skepticism about Tcl's facility for bit manipulation one too many times. While RS's usual tours de force such as "Playing with bits" and "Big bitstring operations" show how good Tcl can be at bit manipulation, this page has a far more limited ambition: simply to help hardware- or C-oriented developers to feel comfortable working on low-level data in a higher-level language.

# Let's experiment:  one model for "low-level data" is a byte array, or
# byte sequence.  A sequence of eight-bit values is often the manifestation
# of information received from a physical device through a serial port, or
# from a remote host through a network connection.  Start, then, with
# sample data, a sequence of seven eight-bit quantities.

set sample \x63\x77\x54\x00\x83\x41\x42

# While they're all there, only some are displayable.
puts $sample

# Let's make a utility that'll show the content of a byte
# sequence as hex data:
proc show byte_sequence {
  binary scan $byte_sequence H* x
  puts [regsub -all (..) $x {\1 }]
}

# Display our sample data.
show $sample

# Suppose bits 3 and 4 combine to make some type specification.
# Let's look at them:
foreach byte [split $sample {}] {
  puts "The type of this byte is '[expr 0x06 & [scan $byte %c]]'."
}

# We can "mask off" unwanted bits.
foreach byte [split $sample {}] {
    puts "After masking, we're looking at '[format %2X [expr 0x3F & [scan $byte %c]]]'."
}

################################################################

# This sample datum is of five sixteen-bit quantities.
set word_sample \u0001\u0020\u0300\u4000\uFEDC

proc show_words word_sequence {
  foreach word [split $word_sequence {}] {
    puts -nonewline "[format %04X [scan $word %c]] "
  }
  puts ""
}

# Suppose I need to look just at words three and four.
set subsample [string range $word_sample 3 4]
# This next displays "4000 FEDC".
show_words $subsample

# We can mask off any bits we choose.
foreach word [split $word_sample {}] {
  puts "After masking, we see '[format %04X [expr 0xF0FF & [scan $word %c]]]'."
}
# The output from that last should have been:
#    After masking, we see '0001'.
#    After masking, we see '0020'.
#    After masking, we see '0000'.
#    After masking, we see '4000'.
#    After masking, we see 'F0DC'.

####################################################################

Thanks to CLN for a drastic simplification of what follows.

# It sometimes happens that vendors define sixteen-bit protocols that they, in effect,
# force through eight-bit pipes.  A network receiver might, for example, receive bytes
# we'll label \x01\x03\x54\x80, with the direction to interpret these as the two
# sixteen-bit words \u0301\u8054 (notice that we're entering the realm of endian
# affairs).  Here's a model for handling such cases:

set byte_sequence \x01\x03\x54\x80\x33\x34

# Notice that "s*" and "S*" account for the two endianness parities.
binary scan $byte_sequence s* display_word_sequence
puts "Here are the words:  '$display_word_sequence'."
binary scan [string range $byte_sequence 0 3] S2 first_two_words
puts "Here are the first two words of the byte sequence:  '$first_two_words'."


# Also:  remarks on RE (string trim; (..?)).

Size in Bits

PYK 2016-01-16: bitsize returns the number of bits used to represent a positive integer.

proc bitsize value {
    set p -1
    while {$value >= ( 1 << [incr p])} {}
    return $p
}