byte

byte -

The amount of memory space used to store one character, which is usually 8 bits. A computer that has 8-bit bytes (most large and small computers today) can distinguish 256 different characters (quoted from the High-Tech Dictionary [L1 ]).

Further, the characters mentioned above are fixed-size characters, either extended ASCII (plain ASCII has only 128 characters, 7-bits worth), or EBCDIC. Unicode is not an 8-bit character set. RS: Tcl uses mostly the UTF-8 encoding, where one Unicode is represented as a sequence of 1..3 bytes (see Unicode and UTF-8). To find out which bytes represent a Unicode (e.g. the Euro sign €) in UTF-8, try the following:

 % binary scan [encoding convertto utf-8 \u20ac] H* resu
 1
 % set resu
 e282ac

escargo - As a historical note, I have worked on machines with 6-bit bytes, 7-bit bytes, 8-bit bytes, and 9-bit bytes. When I worked for Cray Research, Inc., we had major headaches porting software that assumed the whole world was made from machines with 32-bit words, 8-bit bytes, and memories that are directly byte-addressable. I've worked on machines that had 36-bit words, 9-bit bytes, and were word addressed. People who assume that bytes and octets are the same might be comfortable in their assumptions, but they make life more difficult for other people.


CLN - Note to Kernigan and Ritchie: char is not a fundamental type. ;-)