Version 34 of gets

Updated 2013-11-13 08:35:12 by suchenwi

gets - Read a single line from a channel http://www.tcl.tk/man/tcl8.5/TclCmd/gets.htm

gets channelId
gets channelId variable

Reads a single line from the specified channel. In the first form, the characters of the line (with the exception of the end-of-line character) are returned as the result of the command. In the second form the characters of the line are written into the variable and the length of the line is returned instead.

When applied to a blocking channel the command will block until a line is complete or EOF was encountered. If the command is applied to a non-blocking channel and unable to read a complete line the first form of the command will return an empty string. The second form will return a -1 and refrain from setting the variable.

Do not use this command when Working with binary data. It will try to recognize end-of-line characters no matter what, even inside of packets.

If you're using gets in a loop, and want to stop when you reach the end of the file, use the following structure:

while {[gets $filestream line] >= 0} {
    # do what you like here
}

When structuring your code this way (and if your channel is blocking – the default) you do not need to use the eof command to detect when you've read all the data.

George Peter Staplin: Using gets with a socket is a BAD IDEA. tclhttpd uses gets (as well as some of the modules), and it is trivial to make it panic on a unix-like system. With a Windows system, that may not have a ulimit on the memory tclhttpd can allocate, it may be even worse. Sadly, this has been known since 2001 (see below).

20040721 CMcC In defence of tclhttpd: tcl core module http and tcllib modules comm, ftpd, ftp, irc, nntp, pop3, pop3d, smtpd and ident all seem to suffer from precisely the same problem.

20060730 CMcC was thinking about why we don't see this problem in the wild, and remembered that all of the above protocols have per-transaction timeouts. For example, tclhttpd expects a completed header within a defined period after a connection occurs. It is this timeout, not the available address space, which limits the length of a line an attacker can send in most cases. This is not to say that gets shouldn't be fixed, but there is a simple preventative.


Note that a channelID is typically gotten in one of two ways - either the Tcl pre-defined stdin channelId is used, or one saves the result from a command that in some way opens a file, socket, etc. and then provides the variable containing the channelId as the argument to gets. Thus, one will nearly always see gets invoked as:

 gets $someVar 

or

 gets $someVar variable

From the Tcl'ers chat on Oct 24, 2001:

dgp: I reported long ago that tclhttpd was vulnerable to a DoS due to gets slurping up data until it sees a newline. I guess that weakness in gets has never been addressed.

bbh: is it a weakness in gets or a weakness in an app using gets instead of read ?

dgp: Well, Brent Welch replied and said that the solution he would have to implement would be effectively writing his own safe gets in terms of read.

From the Tcl'ers Wiki Sep 20, 2002:

GPS: This bug with gets could be solved I suspect by adding a -maxchars flag to gets. For example:

 set res [gets -maxchars 100 $chan data]

If more than 100 chars are read then gets should return -1 or something like that. This would only be for the usage of gets with the optional variable argument.


MC 29 Oct 2006: I've proposed a [chan available] command (TIP #287 [L1 ]) to give programmers a tool they can use to introspect the amount of buffered (but as of yet unread) data on a channel. This would allow applications enough new introspection capabilities to implement their own policy for handling excessively long input lines, while still retaining the same [gets] semantics. (In a readable fileevent callback, where one should be testing for fblocked already, you could check whether [chan available $sock] > $limit and take appropriate action if it is.)


Theo Verelst This goes back to Unix file access unwritten it seems rules or ideas, at least that is as far as I can trace it, where you'd either assume everything fits in a line, or gets broken up into pieces of known length, which runs you into two problems: you don't know what to do when you read a line which is exactly the length of the buffer, and second, you don't know what happens when a file server or socket has a buffer size distinct of what you assume or set.

(19 jun 03) Come to think about it a bit more, the idea of course being that most or every line could fit in the assigned buffers, so that they can be transfered as a whole immediately as they have been fully buffered.

Otherwise, breaking them up at least lets buffer spaces be optimal possibly for all buffer points, and the newline can (but doesn't necessarily) act as a in-band seperator and synchronisation or trigger. For instance to update the line on a terminal.

The reason for taking a line and not just the smallest data element, a character, would of course be efficiency oriented: every time a transfer from one buffer to another takes place, either between processes on the same machine, or betweem some communication core process and a user process, the process swapper needs to hit in, taking a certain fraction of the for instance 1 mS heartbeat interupt interval, needed to get the processor and process status of the new process to recieve running time and storing the previous processes state and processor register values, which is not negligable time when transering like a megabyte per second at all.

For single characters this could work, but otherwise it pays to not perform a pipe between, for instance,

  $ cat somefile | grep whatever | more

on a character per character basis.

The delimiter character is needed to know when it is needed to flush the buffers instead of simply letting then run full and empty at natural rate.


RS 2005-08-25: Here's how to temporarily disable echoing of the characters input to gets. You need stty which is part of Linux and Cygwin, so it works even on windows: (thanks MNO for the stty tip!)

 proc userpasswd _arr {
    upvar 1 $_arr ""
    if ![info exists (-user)]   {set (-user) [prompt "username:"]} 
    if ![info exists (-passwd)] {
        exec stty -echo
        set (-passwd) [prompt "password:"]
        exec stty echo
        puts ""
    }
 }
 proc prompt string {
    puts -nonewline "$string "
    flush stdout
    gets stdin
 }

More concentrated, here's the "gets with no echo" functionality by itself:

 proc gets'noecho {} {
     exec stty -echo
     gets stdin line
     exec stty echo
     puts ""
     set line
 }

MHo: I think it should be possible to handle the echo state on Windows without installing Cygwin....


See gets workaround for a solution when [gets stdin] won't work, e.g. on W95 and PocketPC.


From comp.lang.tcl, thanks to Alex, a drop-in replacement for gets with an extra timeout argument:

 proc gets_timeout {ch vline timeout} { 
    upvar $vline line 
    set id [after $timeout set ::_gt($ch) 1] 
    set blo [fconfigure $ch -blocking] 
    fconfigure $ch -blocking 0 
    fileevent $ch readable [list set ::_gt($ch) 2] 
    set err NONE 
    while {1} { 
        vwait ::_gt($ch) 
        if {$::_gt($ch)==1} { 
            set err TIMEOUT 
            break 
        } 
        set n [gets $ch line] 
        if {$n<0} { 
            if {[fblocked $ch]} continue 
            set err EOF 
        } 
        after cancel $id 
        break 
    } 
    fconfigure $ch -blocking $blo 
    switch $err { 
        NONE {return $n} 
        TIMEOUT {error TIMEOUT} 
        EOF {return -1} 
    } 
 } 

AMG: I'd like to see an option added to [gets] to override the end-of-line characters. When this option is in use, the delimiter character probably should be retained in the output so the program can tell which delimiter was read, or if a delimiter was read at all before hitting EOF. I guess it would work a bit like getdelim() [L2 ].

Here's some code I use right now that comes close.

# Read from $chan until one of the characters in $delims is encountered.
proc read_delim {chan delims} {
    set result ""
    while {1} {
        set char [read $chan 1]
        if {$char eq ""} {
            error EOF
        } elseif {[string first $char $delims] == -1} {
            append result $char
        } else {
            return $result
        }
    }
}

Pie in the sky: Allow the definition of a "line" to be specified as a regular expression. However, I doubt the Tcl RE code is flexible enough to operate on a stream of data as well as a random-access buffer whose size is known in advance.