Argument Parsing, a discussion

There are several pages on the wiki regarding different methods for argument parsing and its cousin, named arguments. This page is for general discussions regarding a potential system that might satisfy both of those. Here we can toss back and forth general design questions, key concerns being: - Which feels the most TCL-like - Which might sacrifice some freedom for power

One that I've been working on here and there uses a prefix notation in the proc definition for the named argument system to know what to do with. For instance:

proc foo { fname o_lname } {
    puts $fname
}

foo Richard -lname Pryor

Here, the lname is understood by a wrapping proc to be "switchable", and the proc also understands that fname is a required argument, and should not be "named". What do people feel about similar ideas to expand the proc definitions? Another idea is to implement some sort of enumerated definition in the proc "e_actiontype", where actiontype might be defined, maybe within the proc body, or a central location.

Larry Smith Actually, if this were to come up in serious discussion, I'd like to junk the arguments entirely. I suggest this because I'd like to be able to save procedure bodies as variables and invoke them as commands with no "proc" keyword or proc/var distinctions, which I find a tad untidy. I would use something like init with an assumed "args" parameter to initialize and parse arguments passed to the block. I think this somewhat slims down the language while making it more generic. It also forces one to initialize arguments to something appropriate, and could be easily extended to permit doing so with local vars as well and permitting us to de-emphasize "creative writing".

I have experimented with other methods of getting params: One such used an implicit method, with an unnamed 'args'-like params accessed with > ... So "> a b" would pick off the next two parameters and stash them in a and b respectively. A "> c" later on would grab off the "c" parameter, presuming the code determined there was one from the a and b paramters. Parameters could also be passed as dicts with an implicit name like "in" or "args". The > scheme is a bit cryptic and perhaps too terse, the argument dict would be a tad wordy and a bit unwieldy to use in practice, and may make people write boilerplate to unpack args into local vars, but it works very well to unpack args directly into local vars. Of all the methods of doing this I've ever encountered, and I've spent years playing with various programming languages and dialects, I think init is the easiest to learn, the minimalest in syntax, and the most flexible. My only concern with it is the overhead of using it implicitly on every proc call.

escargo 2005-09-19: I think it might be fruitful to consider why there are so many implementations of argument processing.

What are the underlying problems to be solved by centralizing argument processing?
What are advantages and disadvantages of the different implementations?

Using one body of code to process arguments levies certain requirements.

There must be a way of specifying the arguments and how they are processed.
There must be a way of specifying what to do with arguments that are not recognized.
There must be a way of returning the results of processing arguments.
There might be a way of returning arguments that are not processed.

I have seen some systems where there is an explicit grammar that specifies command arguments; each command can hand off a string containing its argument description in that grammar to an argument parsing service.

2005-09-19: I think you've summarized the goals well. I'd note with all of tcl's nice introspection capabilities, one could probably fashion a general handler, instead of hardcoding the relationship to the underlying proc. For myself, I'd love a system that can enables a user to have a light encapsulation around user defined procs that emulates the tcl commands. For instance:

proc parse args {} {
}

proc parse_html_links { o_all o_img o_href html } {
}

could be invoked in the Tcl command way, and be understand as being part of a "parse" command set. The first definition could be the router/service.

set links [parse html links -all -href $html]
set link [parse html links -img $html]

This has several advantages over using the default tcl proc system:

It allows for a readable, definable packaging system but with a generalized implementation
The syntax allows for easy modification of the internals by decoupling argument order and required parameters

NEM 2005-09-19: I'm not a big fan of the o_ prefix, or the _ convention. The first looks a bit like Hungarian notation, and the latter is already served by :: and the new namespace ensemble in 8.5. Regarding argument parsing and named arguments, I'd prefer a simple option-parsing command that takes a list of parameters in the same form that proc does, along with a list of arguments and returns a dictionary of the values, or an error if some parameter has not been specified. e.g.:

proc person args {
    set details [option parse {name age {weight 100} {shoesize 10}} $args]
    dict with details { .... }
}
person -name "Neil" -age 24

("option" is unfortunately a Tk command, though). That seems like the best separation of concerns: doesn't force anybody to use it; can be selectively applied to groups of arguments; and can be wrapped into a higher-level construct easily enough. I would prefer it if the command parsed arguments such that any parameter could be specified either positionally or using -name value syntax (although the first positional argument turns off the -name value parsing). The behaviour could then mimick current proc-style parsing by default (something I've needed on occassion), but also then handle named arguments. e.g.:

proc apply {lambda args} {
    lassign $lambda params body
    set d [option parse $params $args]
    dict with d $body
}
set cmd [list {a b {c 12}} { expr ($a*$b)/$c }]
apply $cmd 1 2
apply $cmd 2 4 10.0
apply $cmd -a 12 15 24 ;# a=12, b=15, c=24

The ability to specify any parameter using the named form is useful for currying when arguments may be accepted in the wrong order.

escargo: Maybe there are three different contexts in which argument parsing occurs:

Shell-level (command prompt) commands that communicate with Tcl scripts as system executables.
Script-level commands that try to look like shell-level commands.
Script-level commands that try to look like Tcl proc invocations.

I have implemented command parsing (shell-level) that tried to allow for either command line or graphical control. There was metadata associated with each command parameter so that the command parser could generate either a text-based or graphical interface to get operator inputs.

Flag parameters: Flag name, description. On a command line expected something like -D or nothing; for a GUI generated a checkbox.
Enumerated parameters: Parameter name, description, list of legal values. On a command line expected something like -param value where value had to be one of the list of legal values; for a GUI generated a radio button group.
Value parameters: Parmeter name, description, type information. On a command line expected something like -param value where value had to conform to the type information; for a GUI generated a entry widget that called a validator using the type information.

Part of the metadata is whether the argument is optional or not; for file names, metadata including information about whether the file had to already exist or not (or didn't matter). Parameters could also be considered to be required. There could also be specification of default values. (Usually I have seen parameters with default values to be optional. Required parameters are of course not optional.)

Some parameters were list parameters, where there was no preceding parameter name. Typically these were lists of file names (usually one or more).

I see help in argument parsing to be of most use for 1 and 2. I'm not quite sure why one would want to use it for the 3rd case.

NEM: To handle defaults and "args" correctly. I think it is also important to separate out matching of arguments to (named) parameters and then checking that those parameters fit some constraints/type. These two concerns are best handled by entirely separate procedures, in my opinion. I also tend to avoid boolean -foo arguments (without a matching value), partly for consistency, but also because they don't always stay as simple boolean options. Providing a command-line application interface (e.g., like GNU tools do) is application- and platform-specific. The syntax for specifying command line options is different between Windows and UNIX, for instance. Best to concentrate on a standard syntax for specifying arguments to Tcl commands (based on that of Tk widgets). There is already the cmdline package in tcllib for shell option parsing.

escargo: - It is useful to point out where there are differences in syntax for command line options; however, that might be an argument (forgive the possible confusion) for hiding some of the details of actual argument processing from the Tcl program. Rather than exactly specifying its arguments, if they are specified in a portable way, the implementation of the argument processing could handle Unix vs. Microsoft Windows details without the program itself needed to be changed.

NEM: Again, I'd say that is better handled by separate commands. If you separate out value checking from the actual parsing then there will probably be very little common code between Windows, Unix and Tk-style argument parsers. You could layer a higher-level construct over the top, but I think the basic usage of an argument parser should be something like:

set data [check $types [option parse $params $args]]

Where "option parse" can be replaced by a Windows or UNIX command-line style parser, or some other parser. A higher- level construct for command-line processing could simply switch on $tcl_platform(platform) and choose an appropriate parser, but the basic API should still be available (to avoid the high-level command becoming another kitchen-sink/clock scan).

escargo: Personally, I think it's a mistake to separate out parameter value checking from option parsing. When I was generating command-line vs GUI value solicitation, I was able to use the type checking information as tool-tips (balloon help) for the GUI. If value checking had been separate, I would not have been able to do that.

NEM: Why? You can easily associate type information with parameters outside of the argument parsing framework. Indeed, the definition of "check" in my last example would fairly depend on it. Just to be clear, here is a more complete example of what I am suggesting:

proc person args {
    # Parameter names, positions and optional default values
    set params {name age {weight 100} {shoesize 10}}
    # Mapping from parameter names to "types"
    set types {name string age integer weight integer shoesize integer}
    # Parse arguments using position/default info
    set values [option parse $params $args]
    # Check types using some type-checking routine
    type check $types $values
}

Note that both $types and $values are dictionaries using parameter name as key. So you could easily do:

proc describe {types values param} {
    return "$param = [dict get $values $param] : [dict get $types $param]"
}

You don't need to shove everything into a monolithic structure or process to be able to associate things.

MAK: As for why there are so many argument parsing schemes, different opinions about how they should be done, and why certain things aren't done by some, argument parsing systems are an abstraction. Abstractions leak. The greater the abstraction the greater the leak. Suggested reading: The Law of Leaky Abstractions [L1 ].

IL 2005-09-21: Tcl is an abstraction, abstractions leak, Tcl is leaky. Anyone care to argue otherwise? ;) Let's stay on topic!

NEM: The way to minimise the leak is to design your abstractions in small pieces and in layers. If one layer leaks it should be a short drop down to the next layer where the problem can be dealt with. Hence my desire to separate argument parsing from argument validation (or type checking). Then, when you build a higher-level abstraction on top of it (which, as you say, will leak) then you still have some useful components with which to build an alternative solution. In terms of Joel's essay: let's build an IP layer before we build TCP on top.

PWQ 2005-09-21:

I don't know why people get carried away with argument parsing and config file formats. It's totally unproductive and doesnt represent the way scripting languages are designed.

I use associative arrays for argument processing. Why? because one command does it all, the syntax for accessing an array is clean and simple, most of my other data is in arrays. You want nested arguments, no problem, just treat the array variable as another array. While dictionarys in 8.5+ are similar they lack the simple referencing via $ that arrays have.

Why do you want to create stupid libraries of code when one simple command parses the arguments for you. Want to read a config file?, three lines of code:

set f [open configfile r]
array set Options [read $f]
close $f

You people have too much time on your hands, or you are not actually creating any worthwhile applications.

Larry Smith: Even this is over-specified. Yeah, it's simpler, but "source configfile" is even shorter. Tcl is its own specification language.

Sure it has limitations, but who cares. It's more important to accept a less flexible argument format and get on with the job of coding rather than spending hours allowing 100 different calling conventions.

Lastly, what about error detection?, again this is simple and doable. But having argument checking is just an excuse for failing to test your code adequately. If a procedure takes a number between 1 and 5, you should not need the routine to tell you that you have passed the number 6 to it in error. Your testing methodology should have prevented you from getting that far.

And don't forget that since we are Scripting you can always instrument code as part of the testing regiment automagically.

escargo 2005-09-20: Reading a configuration file and parsing arguments are related, but not identical. If you are implementing a program that is to be run as a command via a command shell (Unix, Linux, or the Microsoft Windows command prompt), you have to take what they give you and make sense out of it; you can't just read a file of array set commands (or a dict value that would be the equivalent). There are security issues with reading and executing code as you have it as well.

IL 2005-09-19: One critical need for argument parsing, easy options such as the following:

# written in 2k4
proc dofoo { { hasbar "false" } } {
}

# needed functionality for 2k5
proc dofoo { { hasbar "false" } baztype } {
}

Ah, we're in a quandry. Our 2k4 proc has only one argument, and it was used everywhere. I could do string replacement, but it's risky. I could change all calling locations, but that will take time. Had the proc invocations been:

dofoo -hasbar "true" 
# instead of
dofoo "true"

I could simply add my new parameter in locations as I see them, and not worry about compatibility. The same criticism is true of any language where the order of the parameter definition affects how it's used.

# examples
dofoo $baztype -hasbar "true"
dofoo -hasbar "true"
dofoo $baztype

PWQ 2005-09-22:

Both of the above arguments are fallous. Firstly the 2k5 proc should be

proc dofoo { { hasbar "false" } {baztype {}} } ...

Then the existing code would not need to be modified.

Secondly if you had used :

proc dofoo args {
    ...
    array set options $args
     ... existance checking ...
    puts $options(hasbar)
}

Then again no modifications would have been needed. I accept that Tk and Tcl normally use the syntax:

cmd -option val -novaloption arg arg ...

In reality this is just because people are lazy and do not want to have to name command arguments, ie:

open -file xxx -mode r

Which would make all commands consistent and argument parsing a non issue.

NEM: looks forward to such gems as:

proc -name foo -params {a b c} -body {
   set -varname jim -value [set -varname a]
}

That'd certainly weed out those lazy so-and-so's who ruin programming by insisting on relying on some level of implicitness. But wait, we should be using arrays for this, so let us remove those pesky visual clues like the leading -:

proc name foo params {a b c} body {
    set varname jim value [set varname a]
}

Hmm... something's still not right. How does the interpreter know that "proc" is the command name? It's not relying on position is it, when we could easily provide an explicit name? Arggh... let's try again:

command name proc arguments {
    name foo
    params {a b c}
    body {
        command name set arguments {
            varname jim
            value [command name set arguments { varname a }]
        }
    }
}

Nirvana! Now, if I can just write that in XML, all will be good with the world...

IL 2005-09-22: I'm just following what has been standard tcl syntax, I don't see why the language gets to have all the fun. The named parameters function well for options and enumerated types, but the implicit aspect reinforces the purpose of the proc. I would argue simply making all values optional isn't an option because by the procs definition, that parameter should be required; whether or not you can get around it with the array syntax. Maybe some people don't view a defined required parameter as a required parameter but I certainly do. I can accept a different view point on the issue, but it'd be nice to look like the language we're try to enhance. At the very least don't call me lazy!

PWQ 2005-09-23: NEM: very witty :). Taken to conclusion a command would be:

-args ... -body ... -name ... -command proc

In any order.

This is not far from how commands are represented internally. Of cource we cannot ignore that fact that in Tcl, there are no options. This is just a convention, which you can test by trying something like.

puts "Im an text" -nonewline

The point then, If the convention is so ingrained in the language contructs and following by most prorammers, should it become part of the language?

We always have to compromise idealism for practicality. Should argument parsing be part of the language or should be just have proc {args} for maximum flexibility. Currently we opt for the middle ground and have a bit of both which suits noone so it is at least fair.

Larry Smith: To me, most named argument systems are way over-specified and not tclish in their world view. The very idea of demanding certain types for arguments, for example, is not tclish. Sure, sometimes we have a proc that needs an integer in some place - but that fact, and the validation that it requires of the proc, is best left to some explicit check in the proc code, not buried away in the argument-parsing system and only accessed with a convention of some sort. This is why init is specified so simply and why it can work so orthogonally. In one page of code it provides for 90% of all your argument-parsing needs. The rest should be handled as the exceptions that they are.

The 90% rules is applicable everywhere. It's well-known that the first 90% of a project takes 90% of the effort, and the remaining 10% takes the other 90%. What is not as well known is that that extra 10% is something we can often easily do without.

IL 2005-09-22: Am I understanding correctly that people wouldn't be so happy with a system that essentially provides you with the capability to emulate Tcl's command definition format?

lsearch -index 0 $data $value
regexp -nocase -all $regex $data

and instead prefer to choose one of the following?

lsearch -index 0 -data $data -value $value
proc regexp { regex data isall isnocase } {
   ...
}

if so, do you generally view tcl's commands as somehow incomplete or improper?

Lars H 2005-09-23: I think it would be fair to say that many of the older built-in commands have slightly corny syntaxes. regexp and switch for example require an -- option to signal end of options in the cases where data beginning with a hyphen may follow. I believe recently created commands more often have options at the end.

On the other hand, judging from the discussions on the tcl-core list, the TCT doesn't seem very interested in the matter of command syntaxes; what originally gets TIPed is often the syntax that goes into the language. It seems string repeat/lrepeat will forever be a monument to this lack of thorough linguistic analysis before voting.

Argument Parsing, a discussion

See Also

Tcl_ParseArgsObjv minimal example