Version 0 of Reading and parsing RFC 822 headers

Updated 2006-01-10 19:39:31

Here's the package to read and parse RFC 822-like (or RFC 2822-like) headers, which are used in variety of applications nowadays (e-mail, news and Debian Packages.gz files come to mind).

When developing the package, ability to re-produce header just read was considered significant. Eg. the following code should produce header file containing part of message file 'till (and including) first empty line:

 ## get header from `message':
 array unset header
 set chan  [ open "message" r ]
 set order [ read-parse-header $chan header ]
 close $chan

 ## write header to `header':
 set chan  [ open "header" w ]
 write-header $chan $order header
 puts $chan ""
 close $chan

Please note, however, that in mailbox files the header is oftenly prepended with From (note the absence of colon) line, which isn't part of the RFC 822 syntax, so read-parse-header procedure will signal an error on seeing this sort of input.


-- Namespace: ::rfc-822

The `proc's below belong to this namespace, exported and ready to use.

-- Function: read-parse-header ?channelId? arrayName

Reads and parses RFC 822 header fields from channelId (defaulting to stdin if omitted), filling the array specified by arrayName. Returns the order of fields read (as a list).

Header fields are keyed by header name and stored as lists, with each occurence of the header field resulting in exactly one element in the list. Array isn't cleared before processing, it's left to the caller. Leading whitespace is not removed. Header values continued over several lines result in newlines being embedded in the list elements.

Returned order of fields will contain several occurences of the same header name, in case there were multiple occurences of the header.

-- Procedure: write-header ?channelId? order arrayName

Writes header fields specified by order argument to the given channelId (defaulting to stdout if omitted), obtaining values from the array specified by arrayName.

NB: Header fields not included in order will not be output. This can be used to strip away unwanted headers.


The code below is under GNU Lesser General Public License.


 ### rfc822h.tcl --- RFC 822 (and alike) headers  -*- Tcl -*-
 ## $Id: 15254,v 1.1 2006-01-11 07:01:22 jcw Exp $

 ### Copyright (C) 2005, 2006 Ivan Shmakov

 ### Code:

 namespace eval ::rfc-822 {
     variable re-header-name "\[\\041-\\071\\073-\\177\]+"
     variable re-header-del  ":"
     namespace export \
         read-parse-header \
         write-header
 }

 proc ::rfc-822::read-parse-header { { channelId stdin } arrayName } {
     variable re-header-name
     variable re-header-del
     upvar 1 $arrayName header

     set order [ list ]
     set done 0
     set hf ""
     set hv ""
     while { ! $done } {
         if { [ gets $channelId line ] < 0 } {
             error "eof or no data while reading header"
         }
         if { [ string length $line ] == 0 } {
             ## end of header
             if { [ string length $hf ] != 0 } {
                 lappend header($hf) $hv
                 lappend order $hf
             }
             set done 1
         } elseif { [ string match "\[ \t\]*" $line ] } {
             ## continuation of a header field
             if { [ string length $hf ] == 0 } {
                 error "no header to continue"
             }
             append hv "\n" $line
         } elseif { [ regexp -- \
                          "^(${re-header-name})${re-header-del}(.*)\$" \
                          $line \
                          dummy n v ] } {
             ## new header field
             if { [ string length $hf ] != 0 } {
                 lappend header($hf) $hv
                 lappend order $hf
             }
             set hf $n
             set hv $v
         } else {
             error "cannot parse header line: $line"
         }
     }
     ## .
     set order
 }

 proc ::rfc-822::write-header { { channelId stdout } order arrayName } {
     ## FIXME: should it write header fields not in ORDER?
     upvar 1 $arrayName header
     array unset is
     foreach hf $order {
         if { ! [ info exists is($hf) ] } {
             set is($hf) 0
         }
         set i $is($hf)
         set l $header($hf)
         if { $i >= [ llength $l ] } {
             ## silently ignore this case
             continue
         }
         puts $channelId \
             [ format "%s:%s" $hf [ lindex $l $i ] ]
         incr is($hf)
     }
     ## .
     return
 }

 package provide rfc822::headers 0.2

 ### Emacs stuff
 ## Local variables:
 ## fill-column: 72
 ## indent-tabs-mode: nil
 ## ispell-local-dictionary: "english"
 ## mode: outline-minor
 ## outline-regexp: "###\\|proc"
 ## End:
 ## LocalWords:  
 ### rfc822h.tcl ends here

Category Internet Category Package