Version 20 of Bit manipulations

Updated 2005-05-31 19:31:18 by CLN

if 0 {I've come across skepticism about Tcl's facility for bit manipulation one too many times. While RS's usual tours de force such as "Playing with bits" and "Big bitstring operations" show how good Tcl can be at bit manipulation, this page has a far more limited ambition: simply to help hardware- or C-oriented developers to feel comfortable working on low-level data in a higher-level language.}

      # Let's experiment:  one model for "low-level data" is a byte array, or
      # byte sequence.  A sequence of eight-bit values is often the manifestation
      # of information received from a physical device through a serial port, or
      # from a remote host through a network connection.  Start, then, with
      # sample data, a sequence of seven eight-bit quantities.
  set sample \x63\x77\x54\x00\x83\x41\x42
      # While they're all there, only some are displayable.
  puts $sample

      # Let's make a utility that'll show the content of a byte
      # sequence as hex data:
  proc show byte_sequence {
    binary scan $byte_sequence H* x
    puts [regsub -all (..) $x {\1 }]
  }

       # Display our sample data.
  show $sample

      # Suppose bits 3 and 4 combine to make some type specification.
      # Let's look at them:
  foreach byte [split $sample {}] {
    puts "The type of this byte is '[expr 0x06 & [scan $byte %c]]'."
  }

       # We can "mask off" unwanted bits.
  foreach byte [split $sample {}] {
      puts "After masking, we're looking at '[format %2X [expr 0x3F & [scan $byte %c]]]'."
  }

  ################################################################

      # This sample datum is of five sixteen-bit quantities.
  set word_sample \u0001\u0020\u0300\u4000\uFEDC

  proc show_words word_sequence {
    foreach word [split $word_sequence {}] {
      puts -nonewline "[format %04X [scan $word %c]] "
    }
    puts ""
  }

      # Suppose I need to look just at words three and four.
  set subsample [string range $word_sample 3 4]
      # This next displays "4000 FEDC".
  show_words $subsample

      # We can mask off any bits we choose.
  foreach word [split $word_sample {}] {
    puts "After masking, we see '[format %04X [expr 0xF0FF & [scan $word %c]]]'."
  }
      # The output from that last should have been:
      #    After masking, we see '0001'.
      #    After masking, we see '0020'.
      #    After masking, we see '0000'.
      #    After masking, we see '4000'.
      #    After masking, we see 'F0DC'.

  ####################################################################


      # It sometimes happens that vendors define sixteen-bit protocols that they, in effect,
      # force through eight-bit pipes.  A network receiver might, for example, receive bytes
      # we'll label \x01\x03\x54\x80, with the direction to interpret these as the two
      # sixteen-bit words \u0301\u8054 (notice that we're entering the realm of endian
      # affairs).  Here's a model for handling such cases:

  set byte_sequence \x01\x03\x54\x80\x33\x34

  set word_sequence {}
  foreach {low high} [split $byte_sequence {}] {
          # Is there a more elegant way to write this?
          #
          # If you don't like the endianness of the result, it's easy to
          # advise that you swap $low and $high in the calculation
          # which immediately follows.
      set word [expr ([scan $high %c] << 8) + [scan $low %c]]
      set hex [format %04X $word]
      append word_sequence [subst \\u$hex]
  }

  show_words $word_sequence



  # Also:  remarks on RE (string trim; (..?)).

CLN A comment above asks, "Is there a more elegant way to write this? Yes, I think that there is. binary scan can deal with endian issues:

   set byte_sequence \x01\x03\x54\x80\x33\x34
   T34
   % binary scan $byte_sequence "s2" a
   1
   % puts $a
   769 -32684
   % binary scan $byte_sequence "S2" a
   1
   % puts $a
   259 21632