sed

Stream Editor

A program that parses files or standard input, changes it according to a given set of rules, mostly based on Regular Expressions, and sends the new data to standard output. Unbelievably fast.

Erik Pemente keeps a very good amount of info and links in this page: http://www.student.northpark.edu/pemente/sed/ A spirited introduction to sed programming appears here [L1 ].

There is a page dedicated to the Tcl's implementation of Regular Expressions on this wiki. Tcl uses ARE. sed uses Basic and Extended Regular Expressions only.

One reason Tcl'ers encounter sed is that autoconf and autoconf-generated build scripts makes heavy use of sed to mangle text.


 $ sed s/foo/bar/ < infile > outfile

can in Tcl be substituted either with regsub or, when like in this case, no regular expression features are needed, with string map:

 % > outfile [regsub -all foo [< infile] bar]

resp.

 % > outfile [string map {foo bar} [< infile]]

where

 proc > {filename str} {set f [open $filename w]; puts $f $str; close $f}
 proc < {filename}     {set f [open $filename]; return [read $f][close $f]}

RS 2006-09-05: Playing around, waiting for a longish run to complete:

 proc sed {script input} {
    set sep [string index $script 1]
    foreach {cmd from to flag} [split $script $sep] break
    if {$cmd ne "s"} {error "not yet implemented"}
    set cmd regsub
    if {$flag eq "g"} {lappend cmd -all}
    lappend cmd $from $input $to
    eval $cmd
 }

Testing:

 sed s-foo-bar "A foolish idea for fools"
 A barlish idea for fools
 % sed s/foo/bar/g "A foolish idea for fools"
 A barlish idea for barls

Add to it to make it come closer to the real thing:^)


EF Here is a somewhat more capable implementation. It supports substitution of a particular match only, whenever an integer is present in the flag. It also supports the GNU extension for case insensitive matching. This implementation also provides an implementation for "y" and for an extension of mine that I called "e" (as in extract) and that will extract a particular (sub)group, or the text of the regular expression when no particular index is provided.

proc ::sed {script input} {
    set sep [string index $script 1]
    foreach {cmd from to flag} [split $script $sep] break
    switch -- $cmd {
        "s" {
            set cmd regsub
            if {[string first "g" $flag]>=0} {
                lappend cmd -all
            }
            if {[string first "i" [string tolower $flag]]>=0} {
                lappend cmd -nocase
            }
            set idx [regsub -all -- {[a-zA-Z]} $flag ""]
            if { [string is integer -strict $idx] } {
                set cmd [lreplace $cmd 0 0 regexp]
                lappend cmd -inline -indices -all -- $from $input
                set res [eval $cmd]
                set which [lindex $res $idx]
                return [string replace $input [lindex $which 0] [lindex $which 1] $to]
            }
            # Most generic case
            lappend cmd -- $from $input $to
            return [eval $cmd]
        }
        "e" {
            set cmd regexp
            if { $to eq "" } { set to 0 }
            if {![string is integer -strict $to]} {
                return -error code "No proper group identifier specified for extraction"
            }
            lappend cmd -inline -- $from $input
            return [lindex [eval $cmd] $to]
        }
        "y" {
            return [string map [list $from $to] $input]
        }
    }
    return -code error "not yet implemented"
}

Try the following examples:

 % sed s/FOO/bar/1i "A foolish idea for fools"
 A foolish idea for barls
 % sed y/foo/bar "A foolish idea for fools"
 A barlish idea for barls
 % sed e/(o*)ls/1 "A foolish idea for fools"
 oo