ASCII

Difference between version 32 and 33 - Previous - Next
The '''[http://www.incits.org/scopes/incits4.htm%|%American Standard Code for Information Interchange]''', published by [ANSI], specifies a set of 128 characters (control characters and graphic characters, such as letters, digits, and symbols) with their coded representation.  [http://en.wikipedia.org/wiki/ISO/IEC_646%|%646] is an internationalized version of ASCII.  [http://en.wikipedia.org/wiki/ISO_8859%|%ISO/IEC 8859] is a set of 8-bit codes based on ASCII, intended to be combined with a standard set of [terminal control sequence%|%terminal control sequences]. 


** See Also **

   [ANSI]:   

   [Terminal Control Sequence]:   

   [AsciiArtWidget]:   Provides a text widget with bindings and functions suitable for editing ASCII artwork.



** Reference **

   [http://www.incits.org/scopes/incits4.htm%|%INCITS 4:1986, Information Systems - Coded Character Sets - 7-Bit American NationalStandard Code for Information Interchange (7-Bit ASCII)] link dead!, http://www.incits.org%|%incits.org:   

   [http://foldoc.org/ascii%|%American Standard Code for Information Interchange], by Dennis Howe, 1995 [http://foldoc.org%|%foldoc]:   

   [http://cse.csusb.edu/dick/samples/comp.text.ASCII.html%|%The American Standard Code for Infomation Interchange] ([https://web.archive.org/web/20130321191526/http://csci.csusb.edu/dick/samples/comp.text.ASCII.html%|%alternate]), Richard Botting, updated 2010-01-19:   

   [http://www.worldpowersystems.com/J/codes/%|%ASCII: American Standard Code for Information Infiltration] ([https://web.archive.org/web/20130615102246/http://www.wps.com/projects/codes/index.html%|%alternate]), Tom Jennings, 2004-10-29:   
   [https://web.archive.org/web/20060403020324/http://computerworld.com/news/1999/story/0,11280,35241,00.html%|%1963: ASCII Debuts], by Mary Brandel:   
   [http://www.bobbemer.com/ASCII.HTM%|%Bob Bemer and Communication (ASCII)], Bob Bemer:   An article about Bob Bemer, by Bob "sometimes-you-have-to-toot-your-own-horn" Bemer.

   [http://www.bobbemer.com/ESCAPE.HTM%|%That Powerful ESCAPE Character -- Key and Sequences], by Bob Bemer, 2003-10-25:   

   [http://www.columbia.edu/kermit/ascii.html%|%US ASCII, ANSI X3.4-1986 (ISO 645 International Reference Version)], [http://www.columbia.edu/kermit/%|%The Kermit Project]:   

   [http://paulbourke.net/dataformats/ascii/%|%ASCII Codes], Paul Bourke, 1995:   



** Resource **

   [https://web.archive.org/web/20140204092700/http://www.thuglife.org/%|%thuglife]:   Ascii art website.

   [http://www.chris.com/ascii/%|%chris.com]:   More ascii art.

   [http://www.crummy.com/cgi-bin/msm/map.cgi/ASCII,+Dammit%|%ASCII dammit]:   Written as a Python library and capable of ASCIIfying not only MS smart quotes but (with varying degrees of accuracy) most of ISO-Latin-1. For use in fits of parochialism when you want something in ASCII, dammit.



** Description **

ASCII specifies the numerical encoding and meaning of 128 symbols, 95 printable characters and 33 control characters.  The first letter of ASCII stands for '''American''', so there's no use complaining that the printable characters don't include various accented letters or non-English characters.

(For those extra characters, you need another character set.  Luckily there's many that do the job very nicely, like ISO 8859-1 for western european languages, which are also proper supersets of the ASCII set.)

One very frequently-asked question is how one converts between display and numeric format for ASCII characters. [Scan] provides the usual answer.

And to convert an integer to a character, [Format] provides the usual answer.

----

[RS]: Pure ASCII is a 7-bit encoding, covering byte values `\x00`..`\x7F`, so the "128 symbols" mentioned above have no room. But as the ASCII is at the core of iso8859-x encodings, Windows and Mac codepages, and even the Unicode, it's easy to extract this core for looking at the 94 printable characters:

======
proc ascii {} {
    set res {}
    for {set i 33} {$i<127} {incr i} {
        append res "[format %2.2X:%c $i $i] "
        if {$i%16==0} {append res \n}
    }
    set res
}
======
======none
21:! 22:" 23:# 24:$ 25:% 26:& 27:' 28:( 29:) 2A:* 2B:+ 2C:, 2D:- 2E:. 2F:/ 30:0 
31:1 32:2 33:3 34:4 35:5 36:6 37:7 38:8 39:9 3A:: 3B:; 3C:< 3D:= 3E:> 3F:? 40:@ 
41:A 42:B 43:C 44:D 45:E 46:F 47:G 48:H 49:I 4A:J 4B:K 4C:L 4D:M 4E:N 4F:O 50:P 
51:Q 52:R 53:S 54:T 55:U 56:V 57:W 58:X 59:Y 5A:Z 5B:[ 5C:\ 5D:] 5E:^ 5F:_ 60:` 
61:a 62:b 63:c 64:d 65:e 66:f 67:g 68:h 69:i 6A:j 6B:k 6C:l 6D:m 6E:n 6F:o 70:p 
71:q 72:r 73:s 74:t 75:u 76:v 77:w 78:x 79:y 7A:z 7B:{ 7C:| 7D:} 7E:~ 
======

----

CJU: ASCII values > 127 are considered "Extended ASCII," IIRC. Correct me if I'm wrong but I seem to remember that IBM were the ones to originally implement it when they introduced the first IBM PC. Among other fancy symbols, it contains glyphs for drawing single-line and double-line boxes on a text terminal.

[AMG]: [IBM] (or was it [Microsoft]?  doubtful...) had the notion of ''code pages'' which are little more than alternative fonts for characters numbered 128 through 255.  (I suppose a few code pages might redefine ASCII but I don't know if this was ever done.)  The code page number used by the system would (should?) get saved somewhere in the filesystem in order to give meaning to the character numbers.

I don't know anyone who has ever used anything other than CP437, the famous one with the solid and shaded boxes and the single and double line drawing characters plus a handful of accented vowels, international currencies, and a couple Greek letters and math symbols.  (But no multiplication sign!)  So-called ASCII art (like in BitchX) is typically done using CP437 symbols.

[LV]: One of the most common ''code page'' encounters I have has to do with special symbols for the quotation mark ("), the apostrophe ('), and the hyphen (-). In the old days (and I suspect this continues today), Microsoft Word used to use '''smart characters''', which resulted in the character typed in by the user being replaced with another. For instance, one might type the quotation mark, and what was replaced  was one of the two special code page characters that more closely resembled open and closed quotation marks.

[AMG]: Also, Microsoft gets it wrong for contractions where the first part of the word is replaced with an apostrophe, for example when abbreviating a year: '08.  Word and friends treat that initial apostrophe as an initial single opening quote, which is incorrect.

----

[lordmundi]: I'm curious if someone can help me.  I have a mixed string of ascii characters and encoded values, and by that I mean, the function I am calling returns a normal string for most characters but for items like spaces, parenthesis, etc., it encodes them with the ascii value so that a string can be passed back and forth without worrying about special characters.  For example, one string I have is:

======none
EDGE\032on\032localhost\032\04025880\041
======

so, as you can see, all of the spaces are written as "\032" and parenthesis with their ascii code, and so on.  How can I pass this string to a function and have it interpret any "\###" codes it encounters and return the the decoded string?

[AMG]: Try `[[[subst] -nocommands -novariables]]`.

[RLE] 2011-03-08: Do you have control over what is returned by the other function?  If so, and you can modify it to return hex encoded values (\x20 instead of \032) or to encode the characters as octal then `[subst]` will perform the backslash substitutions for you.

[AMG]: RLE, I thought that was octal, but I guess you're right: space should be `\040`.  However, `\x` should be avoided due to its surprising behavior when the encoded character is followed by valid hexadecimal digits.  Use four-digit \u instead.

[lordmundi]: Unfortunately I don't have control over it.  This is the way the encoded string is coming back to me from the bonjour/DNS-SD protocol (mainly so it can be sent in later in the same format).  This is what I ended up making to decode the string for printing - let me know if you guys see any way I could improve it:

======
# A proc to remove leading zeroes from a string
proc stripzeros value {
    set retval [string trimleft $value 0]
    if {![string length $retval]} {
        return 0
    }
    return $retval
}

# A procedure to decode decimal ascii sequences in a string
proc get_printable_name encoded_name {
    set regex {\\[0-9][0-9][0-9]}
    set sub {[format %c [stripzeros [string range "\\&" 1 end]]]}
    set retval [subst [regsub -all $regex $encoded_name $sub]]
    return $retval
}

puts [get_printable_name {kramer\032\09100\05826\05818\058fc\0588f\05809\093._workstation._tcp.local.}]
======
prints:

======none
kramer [00:26:18:fc:8f:09]._workstation._tcp.local.
======

[RLE] 2011-03-09: With a small tip from the discussion on the `[scan]` page:

======
proc get_printable_name encoded_name { 
    return [subst [regsub -all {\\([0-9]{3})} $encoded_name {[
        format %c [scan \1 %d]]} ] ]
}
======
======none
% get_printable_name {kramer\032\09100\05826\05818\058fc\0588f\05809\093._workstation._tcp.local.}
kramer [00:26:18:fc:8f:09]._workstation._tcp.local.
======

[lordmundi] 2011-03-11: Wow... that is a lot better.  Thanks!

<<categories>> Characters | Glossary