One of the applications of the code, described in [Reading and parsing RFC 822 headers], is to manage a news archive. Messages in the newsgroups are formatted according to the [RFC] 1036 [http://www.faqs.org/rfc/rfc1036.txt], which is a variant of [RFC 822], but somewhat more restrictive. The code below is to provide some sanity checks (e. g., multiple instances of the same header are disallowed) and parsing for the values of a couple of RFC 1036 header fields ('''Message-ID''' and '''Xref'''.) Not much, but still better than nothing. Namespace: ::rfc-1036 The '''proc'''s below belong to this namespace, exported and ready to use. Function: trim ''string'' Returns ''string'' with any leading spaces and tabs removed. Function: get-field ''arrayName'' ''field'' Returns the value of the header field ''field'', as stored in array ''arrayName'', with any leading spaces and tabs removed. If there're no, or multiple, instances of the field, an error is signaled. Function: parse-Message-ID ''value'' ?''checkOnlyP''? Parses ''value'' as a '''Message-ID''' and returns the list of two elements: the unique part and the host part. If ''value'' cannot be parsed as a '''Message-ID''', an error is signaled. If ''checkOnlyP'' is given and is a [true] value, an empty string is returned, but the error checking is nevertheless performed. Function: parse-Xref ''value'' ?''checkOnlyP''? Parses ''value'' as an '''Xref''' and returns the list of two values: the host part and the list of newsgroup-article number pairs, suitable for a later '''foreach''': foreach { group number } $pairs { ... } If ''value'' cannot be parsed as an '''Xref''', an error is signaled. If ''checkOnlyP'' is given and is a [true] value, an empty string is returned, but the error checking is nevertheless performed. '''TODO''': implement parsing of more RFC 1036 header fields; '''Newsgroups''', '''References''' probably will be most useful, and '''Path''' is probably easiest to implement. ---- ### rfc1036p.tcl --- Parsing RFC 1036 headers -*- Tcl -*- ## $Id: 16304,v 1.1 2006-08-21 18:00:33 jcw Exp $ ### Copyright (C) 2006 Ivan Shmakov ## This library is free software; you can redistribute it and/or modify ## it under the terms of the GNU Lesser General Public License as ## published by the Free Software Foundation; either version 2.1 of the ## License, or (at your option) any later version. ## This library is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## Lesser General Public License for more details. ## You should have received a copy of the GNU Lesser General Public ## License along with this program; if not, write to the Free Software ## Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 ## USA ### Code: namespace eval ::rfc-1036 { ## basic REs ## FIXME: the RFC says nothing about these two variable group-re "\[0-9a-zA-Z_.+-\]+" variable host-re "\[0-9a-zA-Z.-\]+" ## header value REs variable message-id-re \ "^<(\[^<@> \t\]+)@(\[^<@> \t\]+)>\$" variable xref-pair-re "(${group-re}):(\[0-9\]+)" variable xref-re \ "^(${host-re})((?:\[ \t\]+${xref-pair-re})+)\$" ## exports namespace export \ trim \ get-field \ parse-Message-ID \ parse-Xref } ## : Return STR with any leading spaces and tabs removed proc ::rfc-1036::trim { str } { ## . string trimleft $str " \t" } ## : Return the value of FIELD of the header proc ::rfc-1036::get-field { arrayName field } { upvar 1 $arrayName header set varn header($field) if { ! [ info exists $varn ] } { error "no `$field' header field" } set lst [ set $varn ] ## NB: RFC 1036 says nothing about this case; ## at least my INN rejects articles with duplicate headers if { [ llength $lst ] != 1 } { error "multiple `$field' header fields" } ## . trim [ lindex $lst 0 ] } ## : Return the unique part and the host part of VALUE proc ::rfc-1036::parse-Message-ID { value { check-only? 0 } } { variable message-id-re if { ! [ regexp -- ${message-id-re} $value \ dummy unique host ] } { error "`Message-ID' does not match the pattern" } if { ${check-only?} } { ## . return } ## . list $unique $host } ## : Return the host part and the list of locations of VALUE proc ::rfc-1036::parse-Xref { value { check-only? 0 } } { variable xref-pair-re variable xref-re if { ! [ regexp -- ${xref-re} $value \ dummy host rest ] } { error "`Xref' does not match the pattern" } if { ${check-only?} } { ## . return } set pairs [ list ] foreach s [ split $rest " \t" ] { if { ! [ string length $s ] } { continue } if { ! [ regexp -- ${xref-pair-re} $s \ dummy group number ] } { error "unreachable" } lappend pairs $group $number } ## . list $host $pairs } package provide rfc1036::parse 0.1.1 ### Emacs stuff ## Local variables: ## fill-column: 72 ## indent-tabs-mode: nil ## ispell-local-dictionary: "english" ## mode: outline-minor ## outline-regexp: "###\\|proc" ## End: ## LocalWords: ### rfc1036p.tcl ends here ---- !!!!!! %| [Category Internet] | [Category Package] | [Category Parsing] |% !!!!!!