KBK 2009-08-01 -- Emiliano Gavilán raised a question on the Tcl'ers Chat about parsing RFC2822-formatted dates and times (such as the ones that appear in mail headers). His question made me realize that I had never Wikified the code that I use for the purpose. It's a wrapper around [clock scan] that matches the format from a limited set, and then uses clock scan to extract the data.
if {![package vsatisfies [info tclversion] 8.5]} { package require newclock } namespace eval rfc2822 { namespace export parseDate variable datepats {} } # AddDatePat -- # # Internal procedure that adds a date pattern to the pattern list # # Parameters: # wpat - Regexp pattern that matches the weekday # wgrp - Format group that matches the weekday # ypat - Regexp pattern that matches the year # ygrp - Format group that matches the year # mdpat - Regexp pattern that matches month and day # mdgrp - Format group that matches month and day # spat - Regexp pattern that matches the seconds of the minute # sgrp - Format group that matches the seconds of the minute # zpat - Regexp pattern that matches the time zone # zgrp - Format group that matches the time zone # # Results: # None # # Side effects: # Adds a complete regexp and a complete [clock scan] pattern to # 'datepats' proc rfc2822::AddDatePat { wpat wgrp ypat ygrp mdpat mdgrp spat sgrp zpat zgrp } { variable datepats set regexp {^[[:space:]]*} set pat {} append regexp $wpat $mdpat {[[:space:]]+} $ypat append pat $wgrp $mdgrp $ygrp append regexp {[[:space:]]+\d\d?:\d\d} $spat append pat { %H:%M} $sgrp append regexp $zpat append pat $zgrp append regexp {[[:space:]]*$} lappend datepats $regexp $pat return } # InitDatePats -- # # Internal rocedure that initializes the set of date patterns allowed in # an RFC2822 date # # Parameters: # permissible - 1 if erroneous (but common) time zones are to be # allowed, 0 if they are to be rejected # # Results: # None. # # Side effects: proc rfc2822::InitDatePats { permissible } { # Produce formats for the observed variants of ISO2822 dates. Permissible # variants come first in the list; impermissible ones come later. # The month and day may be "%b %d" or "%d %b" foreach mdpat {{[[:alpha:]]+[[:space:]]+\d\d?} {\d\d?[[:space:]]+[[:alpha:]]+}} \ mdgrp {{%b %d} {%d %b}} \ mdperm {0 1} { # The year may be two digits, or four. Four digit year is done # first. foreach ypat {{\d\d\d\d} {\d\d}} ygrp {{ %Y} { %y}} { # The seconds of the minute may be provided, or omitted. foreach spat {{:\d\d} {}} sgrp {:%S {}} { # The weekday may be provided or omitted. It is common but # impermissible to omit the comma after the weekday name. foreach wpat { {(?:Mon|T(?:ue|hu)|Wed|Fri|S(?:at|un)),[[:space:]]+} {(?:Mon|T(?:ue|hu)|Wed|Fri|S(?:at|un))[[:space:]]+} {} } wgrp { {%a, } {%a } {} } wperm { 1 0 1 } { # Time zone is defined as +/- hhmm, or as a # named time zone. Other common but buggy # formats are GMT+-hh:mm, a time zone name in # quotation marks, and complete omission of # the time zone. foreach zpat { {[[:space:]]+(?:[-+]\d\d\d\d|[[:alpha:]]+)} {[[:space:]]+GMT[-+]\d\d:?\d\d} {[[:space:]]+"[[:alpha:]]+"} {} } zgrp { { %Z} { GMT%Z} { "%Z"} {} } zperm { 1 0 0 0 } { if { ($zperm && $wperm && $mdperm) == $permissible } { AddDatePat $wpat $wgrp $ypat $ygrp \ $mdpat $mdgrp \ $spat $sgrp $zpat $zgrp } } } } } } return } # Initialize the date patterns namespace eval rfc2822 { InitDatePats 1 InitDatePats 0 rename AddDatePat {} rename InitDatePats {} puts [join $datepats \n] } # rfc2822::parseDate -- # # Parses a date expressed in RFC2822 format # # Parameters: # date - The date to parse # # Results: # Returns the date expressed in seconds from the Epoch, or throws # an error if the date could not be parsed. proc rfc2822::parseDate { date } { variable datepats # Strip comments and excess whitespace from the date field regsub -all -expanded { \( # open parenthesis (:? [^()[.\.]] # character other than ()\ |\\. # or backslash escape )* # any number of times \) # close paren } $date {} date set date [string trim $date] # Match the patterns in order of preference, returning the first success foreach {regexp pat} $datepats { if { [regexp -nocase $regexp $date] } { return [clock scan $date -format $pat] } } return -code error -errorcode {RFC2822 BADDATE} \ "expected an RFC2822 date, got \"$date\"" } # Usage example if {![info exists ::argv0] || [info script] ne $::argv0} return puts [clock format \ [rfc2822::parseDate {Mon(day), 23 Aug(ust) 2004 01:23:45 UT}]] puts [clock format \ [rfc2822::parseDate "Tue, Jul 21 2009 19:37:47 GMT-0400"]]
See also: Parsing ISO8601 dates and times
AK - 2009-08-04 12:59:32
Note that Tcllib's mime package also has a parsedatetime function, albeit it references RFC 822, i.e. the predecessor to RFC 2822. It uses, afaik, a mixture of 8.4 clock scan, scan, and regexes.
EG - 2024-04-12
Added a space in front of both %Y and %y. For some reason the following code
clock scan {Tue, 9 Apr 2024 12:34:12 +0200} -format {%a, %d %b%Y %H:%M:%S %Z}
worked until merge of TIP 688 . Now it works with both clock implementations.