Here's the package to read and parse RFC 822-like (or RFC 2822-like) headers, which are used in a variety of applications nowadays (e-mail, news and Debian Packages.gz files come to mind). When developing the package, the ability to re-produce a header just read is important. E.g. the following code should produce a '''header''' file containing the part of the '''message''' file until (and including) the first empty line: ## get header from `message': array unset header set chan [ open "message" r ] set order [ read-parse-header $chan header ] close $chan ## write header to `header': set chan [ open "header" w ] write-header $chan $order header puts $chan "" close $chan Please note, however, that in mailbox files the header is often prepended with '''From ''' (note the absence of colon) line, which isn't part of the RFC 822 syntax, so the '''read-parse-header''' procedure will signal an error on seeing this sort of input. One approach to this problem can be seen in [Mbox to MH directory conversion tool]. ---- Namespace: ::rfc-822 The '''proc'''s below belong to this namespace, exported and ready to use. Function: header-fields-order ''arrayName'' Returns the order of fields in header, which is currently stored in array element with key ''':order'''. Empty list is returned if no such element exists. Procedure: header-field-finish ''arrayName'' This procedure is called to signal that the header processing is done. To allow header field continuation, an implementation of '''header-parse-line''' must preserve some information between the calls. The information is currently stored in array elements with keys beginning with a couple of colons. This procedure flushes such a temporary information (if any), enforcing the array to actually represent the header read. This procedure is automagically called by '''header-parse-line''' when it sees an empty line, which is a common convention of terminating the header. Function: header-parse-line ''arrayName'' ''line'' Parses the ''line'' given, modifying the array ''arrayName'' accordingly. Given an empty line calls '''header-field-finish''' and returns '''1''', otherwise returns '''0'''. See the description of the '''read-parse-header''' for more information. Function: read-parse-header ?''channelId''? ''arrayName'' Reads and parses RFC 822 header fields from ''channelId'' (defaulting to '''stdin''' if omitted), filling the array specified by ''arrayName''. Returns the order of fields read (as a list). Header fields are keyed by header name and stored as lists, with each occurence of the header field resulting in exactly one element in the list. Array isn't cleared before processing, it's left to the caller. Leading whitespace is ''not'' removed. Header values continued over several lines result in newlines being embedded in the list elements. Returned order of fields will contain several occurences of the same header name, in case there were multiple occurences of the header. Procedure: write-header ?''channelId''? ''order'' ''arrayName'' Writes header fields specified by ''order'' argument to the given ''channelId'' (defaulting to '''stdout''' if omitted), obtaining values from the array specified by ''arrayName''. NB: Header fields not included in ''order'' will ''not'' be output. This can be used to strip away unwanted headers. ---- ### rfc822h.tcl --- RFC 822 (and alike) headers -*- Tcl -*- ## $Id: 15254,v 1.5 2006-01-16 07:00:23 jcw Exp $ ### Copyright (C) 2005, 2006 Ivan Shmakov ## This library is free software; you can redistribute it and/or modify ## it under the terms of the GNU Lesser General Public License as ## published by the Free Software Foundation; either version 2.1 of the ## License, or (at your option) any later version. ## This library is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## Lesser General Public License for more details. ## You should have received a copy of the GNU General Public License ## along with this program; if not, write to the Free Software ## Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 ## USA ### Code: namespace eval ::rfc-822 { variable re-header-name "\[\\041-\\071\\073-\\177\]+" variable re-header-del ":" namespace export \ header-fields-order \ header-field-finish \ header-parse-line \ read-parse-header \ write-header } ## In the code below, header is an array keyed by header field name, ## and whose values are corresponding header field values. Because RFC ## 822 header field name cannot contain colons, the convention is to ## put additional information (such as the order of fields) into array ## elements with keys beginning with one or more colons. Eg. the order ## of fields is stored in the element with key ":colon". The keys of ## the private fields of the package begin with two colons. ## : Get the value of a variable, or default if it's not existent proc ::rfc-822::safe-get { varn { default "" } } { upvar 1 $varn var ## . expr { [ info exists var ] ? [ set var ] : $default } } ## : Get the order of fields in the header proc ::rfc-822::header-fields-order { arrayName } { ## . safe-get header(:order) [ list ] } ## : Store an accumulated header line (if any) in the header proc ::rfc-822::header-field-finish { arrayName } { upvar 1 $arrayName header set varnf header(::last-field) set varnv header(::last-value) if { ! [ info exists $varnf ] } { ## no header field to finish ## . return } set field [ set $varnf ] lappend header($field) [ set $varnv ] lappend header(:order) $field unset $varnf $varnv ## . return } ## : Parse just one given header line, modifying the header proc ::rfc-822::header-parse-line { arrayName line } { ## => 0 | 1 (done) variable re-header-name variable re-header-del upvar 1 $arrayName header if { [ string length $line ] == 0 } { ## end of header, store accumulated field, if any header-field-finish header return 1 } elseif { [ string match "\[ \t\]*" $line ] } { ## accumulate continuation of a header field if { [ info exists header(::last-field) ] == 0 } { error "no header to continue" } append header(::last-value) "\n" $line } elseif { [ regexp -- \ "^(${re-header-name})${re-header-del}(.*)\$" \ $line \ dummy n v ] } { ## new header field header-field-finish header set header(::last-field) $n set header(::last-value) $v } else { error "cannot parse header line" } ## . return 0 } ## : Read lines from the channel, parsing them as the RFC 822 header proc ::rfc-822::read-parse-header { { channelId stdin } arrayName } { upvar 1 $arrayName header set done 0 if { ! [ info exists header(:order) ] } { set header(:order) [ list ] } while { ! $done } { if { [ gets $channelId line ] < 0 } { error "eof or no data while reading header" } if { [ header-parse-line header $line ] } { set done 1 } } ## . set header(:order) } ## : Write lines to the given channel, formatting them as the header proc ::rfc-822::write-header { { channelId stdout } order arrayName } { ## FIXME: should it write header fields not in ORDER? upvar 1 $arrayName header array unset is foreach hf $order { if { ! [ info exists is($hf) ] } { set is($hf) 0 } set i $is($hf) set l $header($hf) if { $i >= [ llength $l ] } { ## silently ignore this case continue } puts $channelId \ [ format "%s:%s" $hf [ lindex $l $i ] ] incr is($hf) } ## . return } package provide rfc822::headers 0.3 ### Emacs stuff ## Local variables: ## fill-column: 72 ## indent-tabs-mode: nil ## ispell-local-dictionary: "english" ## mode: outline-minor ## outline-regexp: "###\\|proc" ## End: ## LocalWords: ### rfc822h.tcl ends here ---- [Category Internet] | [Category Package]