Version 19 of Unix shells

Updated 2016-06-16 02:37:37 by RoyKeene

Most Unix shells belong to either the Bourne Shell family or the csh family, although other alternative shells exist. Tcl may not be a direct descendant of any of the preexisting shells, but it certainly shares some of their traits, notably EIAS, where it Tcl is more disciplined and consistent than other shells.

See Also

gush
an attempt to write an updated Unix shell/terminal in Tcl.
Tcl Heritage
a number of Unix shell languages influenced the design of Tcl.
Tcl and other languages
an listing of languages that have crossed paths with Tcl
Playing Bourne shell
emulating shell syntax and commands in Tcl
Pipethread
POSIX sh-like pipes for Tcl.

Reference

UNIX shell differences and how to change your shell (Monthly Posting)
an introduction to Unix Shells, comparing some oft he more common ones.

Advantages of Tcl for Shell Scripting

Shells in the csh family are known to be unsuitable for any serious scripting, but shells in the sh family are often used extensively for the purpose. Tcl is a better choice. Here are some reasons:

scalability
Often, a smallish task will be scripted in an sh language, but will then increase in size at it matures. sh scripts don't scale well. It's safer to just get in the habit of using Tcl from the get-go.
nested data structures
The arcane syntax of traditional shells simply doesn't allow for nested data structures. Tcl syntax scored a great coup on this account by retaining the shell characteristic that everything is a string while providing a string representation syntax for lists. Other shells get bogged down in quoting hell when environment variables are used in an ad-hoc fashion to pass sequences of values between programs.
variable references
Shells like bash and ksh have tried a couple of times to provide some way of passing data to a function by reference. There is the older ${!nameref} syntax and also the new typeset -n nameref syntax, but there is one serious flaw in both those approaches which makes their use brittle. A variable named nameref itself must be reserved in the global scope, since any attempt to use variable expansion to create the nameref variable might conflict with the name of the referenced variable:
rmval () {
    typeset -g -n nameref=$1
    local i=0
    local val
    for val in "${nameref[@]}" {
        if test $val = "$2"; then
            unset nameref\[$i]
        fi
        ((i++))
    }
    #compact the array
    nameref=( "${nameref[@]}" )
}
The global name nameref is reserved by this function, requiring a consensus among all bash scripts that might use this function to respect that variable. Contrast with the upvar capability of Tcl:
proc rmval {listname val} {
    upvar $listname list
    set idx 0
    foreach item $list {
        if {$item eq $val} {
            lreplace list $idx $idx
        }
        incr idx
    }
    return $list
}

Tcl for Shell Programmers

Tcl shares a key feature with other Unix shells: Each line is a command, and the first word of each line is the name of the command. It isn't necessary to quote simple strings:

#! /bin/env sh
echo hello
#! /bin/env tclsh
puts hello

Where Unix shells typically use the term, expansion, Tcl uses the term, substitution. In Unix shells, a variable is a named parameter. Tcl simply uses the term, variable, and doesn't have special parameters which are not assignable in scripts.

Shell parameter expansion is indicated by $ :

echo $name

Tcl variable substitution also uses $:

puts $name

In Bourne-compatible shells, the output a command can be captured using the following notation:

a=`date`

or

a=$(date)

In Tcl, the same thing is accomplished using brackets to introduce script substitution and exec to capture the output of an external command:

set a [exec date 2>@stdout]

Unlike other Unix shells, Tcl itself has no syntax for I/O redirection, deferring instead to exec, which implements its own syntax for the arguments that are passed to it. By design, Tcl syntax remains minimal, while individual commands are free to interpret arguments passed to them in arbitrarily complex ways.

Unix shells use braces to delimit the variable name from surrounding characters:

echo one${two}three

Tcl does the same:

puts one${two}three

But the similarity stops there. Unix shells provide further parameter expansion syntax for array and string length, indexing, ranges, substitution, default values, alternate values, conditional assignment, pattern matching of array values, and listing of array keys. Tcl, on the other hand, only provides syntax for selecting items in an array:

puts $person(name)

All the remaining parameter expansion functionality of Unix shells is provided in Tcl via commmands rather than special syntax. For example, to extract the range of a string in a Unix shell:

puts ${name:2:4}

In Tcl, string range performs that operation:

puts [string range $name 2 4]

In Unix shells, double quotes protect whitespace while still allowing parameter expansion, command expansion, and backslash escaping:

echo "$(greeting), $name"

In Tcl, double quotes do the same, except that double quotes must completely enclose a word to have special meaning:

puts "[greeting], $name"

In Tcl, double quotes have no special meaning in the middle of a word as they do in other shells. For example, contrast the the following scripts:

#! /bin/env bash
var1=one"two"'three'four
#! /bin/env tcsh
var1=one"two"'three'four
set var1 one"two"'three'four

In the Unix shells, the value of var1 is onetwothreefour, but in Tcl, the value is one"two"'three'four. The double and single quotes are retained as part of the value because they occur within a word rather than enclosing a word.

Unix shells use the single quote character to delimit strings that escape all forms of expansion:

sentence='a value of $42.11 (no kidding); This is just a string.  `backquotes`
mean nothing special, either'

In Tcl, braces {} serve the same purpose. As with double quotes, braces must entirely enclose a word in order to have special meaning. Within braces, embedded brace pairs are allowed, and an unpaired brace character preceded by a backslash is not counted when searching for the closing brace at the end of the word:

set a {a value of $42.11 (no kidding); {This is just a string}.  `backquotes` mean nothing \{ special, either}

In Tcl, substitutions never change word boundaries, whereas in other Unix shells, word splitting occurs after substitution, which means that substitution can affect the number of words in a command:

#! /bin/env bash
var1="three four"
var2=( one two $var1 five )
echo ${#var2[*]} # -> 5
#! /bin/env tclsh
set var1 {three four}
set var2 [list one two $var1 five]
puts [llength $var2] ;# -> 4

Tcl provides {*} to change the number of words in a command. Any word preceded by {*} is interpreted as a list, the items of which become individual words in the command:

#! /bin/env tclsh
set var1 {three four}
set var2 [list one two {*}$var1 five]
puts [llength $var2] ;# -> 5

The upshot is that whereas in Unix shells, word-splitting is the default, and double-quotes are commonly used to prevent it, in Tcl word splitting doesn't happen unless explicitly requested via{*}. Therefore, double quotes are used less often in Tcl, giving scripts a cleaner look. A common beginner mistake in Tcl is to use double quotes too liberally.

Finally, other shells have built-in syntax for things like piping I/O between commands, executing scripts in a subshell, grouping commands for execution, evaluating mathematical expressions, equality testing of mathematical expressions, iterating through values, for and while loops, simple user interfaces, switch statements, conditional statements, and function definitions. Tcl does away with the syntax for all those features, and instead implements them as commands. By becoming less, Tcl is able to become more. With the help of built-in commands such as upvar, uplevel, and tailcall, new control structures can be implemented, and built-in commands can be replaced with customized versions. In this way, Tcl can be transformed into a language specialized for a specific purpose.

Advantage Tcl: List Represenation

In most Unix shells, it isn't exactly straightforward to pass an array/list of arbitrary values between commands. Consider the following array.

res=(one "two:three four" five)

One way to write a shell function that appends an element to an array is

arr_append () {
    eval $1'[${#'$1'[*]}]="$2"'
}
a1=(one "two:three four" five)
arr_append a1 "six seven" 

, which is painful both to read and to write. It's more simple in Tcl:

lappend a1 {six seven}

Even if lappend didn't exist, it would still be more simple in Tcl:

set a1 [concat $a1 [list {six seven}]]

Here is another example of even more egregious shell syntax. This function sorts an array in place. It takes one argument, which is the name of the array to sort. Shells have no upvar command, so if this function assigned the name of the array to any local variable, there might be a conflict between that local variable and the name of the array to change. Therefore, the function must do some real syntactical gymnasitcs to get the job done:

arr_sort () {
    while read -d ''; do
        set -- "$@" "$REPLY"
    done < <(eval "$(cat <<EOF
printf '%s\0' "\${$1[@]}" | sort -z
EOF
)")

    eval $1'=( "${@:2:$#}" )'
}

Contrast with Tcl, which can simply use a built-in function:

set a [lsort $a[set a {}]]

Anyone who complains about the incovenience of using list to armour values when generating Tcl code probably hasn't had to deal with the mess that is Unix shell syntax!