Reworking the clock command

Purpose: Kevin Kenny is contemplating starting a project to rework the clock command. The purpose of this page is to present some of the issues with the current clock command, and collect feedback about how best to fix it.

I wish I had time to more than merely applaud Kevin's work. This is long overdue! -- CLN

The ideas explored in this page have been brought one step closer to appearing in the Tcl core; TIP #173 [1 ] is now (2004-03-12) being discussed.


The current (Tcl 8.4a3) clock command has served us well since Tcl 7.5, but it's starting to show cracks at all levels. It appears that nothing less than a complete rework may suffice to fix all the issues. (In that case, the existing [clock] command will be preserved, but deprecated - see Free-format clock scan)

1. The measurement of time.

Mickey's little hand is on the six, and his big hand is on the nine.

At present, Tcl's concept of absolute time is the count of seconds from a fixed epoch or zero point. The count is expressed as a signed 32-bit integer. This representation has a number of drawbacks.

  • Range. The representation of time cannot represent dates before 1902 nor after 2038. It is only a few years before banking applications, for instance, which deal with 30-year loans will run into the end of the permissible range. My Municipal Bond data server with 40 year terms already reaches this limit - I'm dropping data on the floor JBR.
  • Precision. The granularity of a second is not adequate for disambiguating timestamps in a number of applications. Now that NTP is a near-universal part of the networking infrastructure, it is routine to keep different systems' clocks synchronized to within a few milliseconds to tens of milliseconds, and one would want to time-stamp actions
  • Epoch differences. The time returned by [clock seconds] is relative to the time epoch of the underlying operating system, and the choice of epoch varies. This should be hidden from the user. Would that we could take Byron's advice: Think'st thou existence doth depend on time? / It doth; but actions are our epochs.
  • Non-uniform representation of time intervals. Most systems return absolute time as a count of nominal seconds of UTC from some epoch, ignoring leap seconds; two times that are a day apart will show precisely 86400 seconds (never 86399 nor 86401) between them, even if a leap second has occurred in between. A few systems (at least certain configurations of FreeBSD) return, instead, a count of seconds of TAI, and on these systems a day will be a second long or short if a leap second occurs.

Proposal: Add to [clock] a [clock milliseconds] command that returns the nominal time from the Julian epoch expressed in milliseconds as a double-precision floating-point number.

Double-precision floating point is chosen as a representation because it is available on all the platforms, unlike 64-bit integers, which are the most obvious alternative. The granularity of milliseconds, rather than seconds or days, ensures that any integer number of milliseconds will be represented exactly.

The Julian epoch is chosen because many existing calendar algorithms, such as those published in the second edition of Reingold [2 ], use Julian day number as their internal representation. The Julian day can be obtained from the millisecond count simply by dividing by 86400000 and casting to an integer.

Since leap seconds cannot be predicted in advance, it is useful to assume that the day has a fixed length of 86400 seconds. For this reason, I propose that the Tcl clock be Smoothed Universal Time (UTS) [3 ]. The code in TclpGetTime on the Unix platform (where the kernel clock is corrected with adjtime) and on the Windows platform (where the Tcl clock is derived from a separate 'performance counter' that is disciplined with a phase-locked loop to the system clock) already comes close to the desired behavior.

Anyone wishing to see the pain involved in combining the range of Tcl's per-second clock with the accuracy of its -millisecond support can check out: http://expect.nist.gov/stopwatch - Don Libes

Couldn't we, during initialization, just capture the millisecond value when the second value changes, and use that as an offset to extract the millisecond values. Like:

    set curtime [clock seconds]
    while {$curtime == [clock seconds]} {
        # Wait here until the seconds count changes
    }
    set milli_offset [clock clicks -milliseconds]
    set milli_offset [expr $milli_offset % 1000]

Then, every time we want seconds and milliseconds, we could use this:

    set insec  [clock seconds]
    set inclicks [clock clicks -milliseconds]
    set intime [format "%d.%03d" $insec [expr ($inclicks - $milli_offset) % 1000]]

This would set intime to a time in seconds.milliseconds. Even the wraparound case (where $inclicks is less than $milli_offset) works because the modulo result is positive. This would work assuming that the execution time in the first block between the while and the following set is roughly equivalent to the execution time in the second block between the two set statements.


2. Calendar

I've been on a calendar, but never on time. - Marilyn Monroe

The handling of the calendar is lacking some points that various users have requested.

  • There is no input of dates in the ISO calendar, e.g., 2001-W45-3 for "the third day of fiscal week 45 in 2001." Even the output of ISO dates has errors, since the calendar year and the fiscal year usually differ in either the first week of January or the last week of December.
  • There is no handling of "week of month" ("the fourth week of November"), needed for handling of certain daylight saving time locales.
  • There is no provision whereby the output could be localized to a non-Gregorian calendar. (Several locales still use traditional Hebraic or Islamic calendars as the civil calendar!)
  • There is no representation for multiple eras (BC/AD; the Japanese "Year of the Emperor" (Things Japanese); the Islamic "anno Hegirae"; the Hebraic "anno mundi"; etc.)

For what it's worth, there's information on various calendars at http://www.copi.org/craig/events/calendar.html . -- CLN


3. Time zone

At the back of the Daylight Saving scheme I detect the bony, blue-fingered hand of Puritanism, eager to push people into bed earlier, and get them up earlier, to make them healthy, wealthy and wise in spite of themselves. - Robertson Davies

Conversion of times between local and UTC (mislabelled, 'GMT,' in the Tcl documentation) depends on the calculations of the underlying system.

  • These calculations are often of poor quality (witness the famous 'April Fool' bug in Windows [4 ], which is known to have caused a costly failure of at least one Tcl application).
  • Many of the underlying systems are capable of handling only a single rule for daylight saving time conversion. If the rules change, they fail to work. (In the pre-Tcl days, a number of applications failed in 1987 when the US Congress changed the dates on which daylight saving time began and ended.)
  • The time zone calculations cannot be localized for locales with unequal hours (several locales, particularly in Arab countries, still maintain solar hours as the civil time).
  • The system allows for only two time zones in an application, local time and UTC. Consider the case of a television station application wishing to report three times, local, UTC, and time at network headquarters in New York. -- particularly if the station's locale observes different daylight saving time rules than New York.

(Or consider time clocks on Hoover Dam: one end of the dam is in Arizona which never observes DST, the other is in Nevada which does observe DST! -- CLN)

  • Choosing a time zone other than that of the local system is perilous; the handling of the TZ environment variable is capricious.
  • Input and output of time zones is by time zone name, which differs not only among languages and locales but also between operating environments. (I've seen my local time zone reported as EST, EST5EDT, -05, -0500, R, Eastern Standard Time, and America/New_York, among others.) This inconsistency causes severe difficulty attempting to parse the time zone on the way back into Tcl. There is no provision for output of the time zone in numeric form; [clock format] cannot produce, for instance, a date in RFC2822 form.

4. Localization and input/output issues

...ad calendas græcas reponere.

Tcl's handling of time ignores the locale. In many cases, it simply cannot be localized easily. Moreover, Tcl's input conversion for times suffers from an attempt to be excessively general; it succeds only in having peculiar bugs.

  • [clock scan] and [clock format] are not localized; they always use the defaults for the 'C' locale. They need to specify at least the standard language_country_variant locales, plus two more: system for the system's default locale, and user for the user's default.
  • [clock scan] also cannot be localized to specify, for instance, whether a given locale expresses dates as mm-dd or dd-mm.
  • The implementation of [clock scan] as a YACC parser without any indication on the command line of the expected format is dreadful. Attempting to deduce from context whether '2000' represents a year, eight o'clock in the evening, a hypothetical time zone (there is no time zone twenty hours east of Greenwich, but the syntax allows for it), or a numeric quantity (2000 seconds, for instance) results in a parser that satisfies nobody.
  • Use of [clock scan] for date and time arithmetic is one reason that it attempts to be a universal tool. It would be nice to have date arithmetic analogous to the Calendar classes in Java [6 ] or ICU [5 ]. There is no clear alternative to code that does things like
    set nextMonth [clock scan {+1 month} -base $today]

AK: See http://www.purl.org/net/akupries/soft/pool/f_base_date.tcl.html


5. Related work.

Non est, crede mihi, sapientis dicere, Vivam: Sera nimis vita est crastina: vive hodie. - Martial

There are a number of related date-and-time packages out there. Alas, none of them seems both to do the whole job and to be suitable for incorporating wholesale into the Tcl core, for various reasons.

Time zone manipulations: The Olson codes at [8 ] and [7 ] are comprehensive and widely used. The chief difficulty with them is that they are based on the 32-bit Posix clock and therefore die after 2037. Certainly, the time zone data sets should be considered for any implementation that we do of time zone conversions.

Java and ICU: The Java classes, Calendar, DateFormat, and so on [11 ], and the corresponding classes in ICU [10 ] and [9 ] are comprehensive. They use a format scheme that is incompatible with strftime and strptime, but the two are fairly easily interconverted (such a conversion would be necessary in any case, in light of the fact that the system locale is specified with one scheme in Unix and the other in Windows). The key drawbacks to adopting these codes wholesale are their deficiencies in time zone handling and the fact that no C implementation exists. The C API provided in ICU is a wrapper layer around a C++ implementation; the Tcl core does not presume the existence of a C++ compiler on the target platform.

The Reingold/Dershowitz codes: The standard references on computer calendrical calculations are a series of books and papers by Reingold and Dershowitz [12 ]. Their Web site has a number of interesting reference implementations that could be good starting points for some of the needed calculations. Again, there is a programming-language barrier: the provided codes are in Common Lisp and C++, not in C or Tcl.

The Pool library: Andreas Kupries provides one fine starting point for many date calculations in Tcl at [13 ].

The Hall codes: Mike Hall has codes [14 ] for calculations with Julian Day Number; again, these may prove a fine starting point.


6. Some contemplations

Tempora mutantur et nos in illis

Arjen Markus I have a number of remarks about the previous sections:

  • Why not introduce an option -milliseconds to the [clock seconds] command? Have it return the number of seconds as an integer number if this option is not present and otherwise as a double value.
   Indeed, [clock seconds -milliseconds] might well be an alternative
   syntax.  I just didn't like it that much because it seemed to be
   contradictory: seconds are not milliseconds.  --KBK

   We might use "-keepmillis" instead --AM

   I would like to introduce a -fmt option : 
      default being %d i.e. truncated to int, 
      alternativ %f or %.nf i.e. %.3f for millisecond resolution --UK
  • Even more adventurous: if we let [clock seconds] always return a double value, would that break any existing scripts? Only those that somehow explicitly assume the time to be an integer number?
   I'm pretty sure that I have scripts that would break if
   presented with something that's not an integer.  --KBK
  • If we leave the [clock seconds] as it is and introduce a new command [clock milliseconds] that uses Julian epoch, how do we know the relationship between the two? That suggests the base for [clock] to be Julian in all cases.
   That would be ideal: alas, that would require 64-bit integers,
   or else another format for [clock seconds].  The difficulty here
   is that seconds from the Julian epoch overflows a 32-bit word. --KBK
  • If we change the base for the [clock] command, then what other commands need reworking? Surely [file stat] is one: we want any date/time as represented by a single number to be comparable, in my opinion.
   I'd like to make the necessary documentation changes to indicate
   that [file stat] and [clock seconds] track the Posix epoch on
   all platforms (incidentally, fixing any that don't -- but I don't
   think there are any left). --KBK
  • How does the conversion from actual time to integer seconds work? Is that a rounding-off or a truncation? Probably the underlying system is responsible.
   The underlying system provides TclpGetTime (soon to be renamed
   Tcl_GetTime) which returns time in a structure comprising a 32-bit
   count of seconds from the Posix epoch and a count of microseconds
   within the second.  The time is ignorant of leap seconds; inserting
   or deleting a second results in the clock's being sped up or slowed
   down by a factor of 1.001 until it once again agrees with its
   external reference.  The smoothed time is truncated to give the
   count of seconds.  (This is how it's implemented on Unix, Windows
   and MacOSX, just not explicitly documented.)  Note that TclpGetTime
   on Windows provides significantly better precision than the system
   clock.  --KBK
  • If we have [clock seconds] return a double value, then [clock format] and [clock scan] need revision as well. There is a clear advantage over the introduction of a separate Julian date and time: the result of any [clock] command would be compatible with any other time.
   You have a good point there, and I'd consider using the Posix
   epoch instead.  In any case, there are going to be codes that
   want Posix, ones that want Reingold's 'absolute day', and ones
   that want JD or MJD; adding or subtracting an offset or
   multiplying by a constant factor isn't too much of a burden if
   the data are provided. --KBK
  • One limitation of the current [clock scan] command is that to parse non-English names of the months you have to build your own scanner. Though English is ubiquitous, it is not the only language found on computers.
  • Another one is the order in which months, days and years are presented and the separating characters. I am well aware of the fact that this leads to considerable ambiguities, but, well, these may be solved conveniently with a Java-like Calendar package. For instance, a date like 2001/05/01 would be written as "1 mei 2001" in Dutch and "1. Mai 2001" in German, or, numerically, "1/5/2001", respectively, "1.5.2001"
   Indeed.  L10n of [clock scan] is the issue on which [clock scan]
   founders.  What I want to do is break it into layers:

   + a low-level layer like the Calendar class in Java or ICU,
     which accepts dates and times in numeric format with the fields
     identified.

   + an intermediate layer like SimpleDateFormat in Java or ICU,
     or like strptime in C99, which accepts string dates in a format
     specified by the caller and takes them apart into the numeric
     fields.

   + a high-level layer that accepts a looser definition such as
     'this is an RFC2822 date' or 'this is an ISO8066 date' and
     identifies the precise format in use to pass to the intermediate
     layer.  This layer would include a simple-minded scanner
     generator so that l10n could happen fairly easily.  I've been
     playing with some implementation ideas here.  --KBK
  • Fairly arbitrary arithmetic is possible via Julian date/time: Mike Hall has a nice beginning of such a package at http://www.enteract.com/~mghall . What is lacking there is the manipulation via explicit procedures of hours, minutes and seconds.
   I've started a section on 'Related Work' above.  --KBK

Arjen Markus In answer to questions by Brett Schwartz:

You are probably doing that already, but manipulating dates internally via julian dates (just doubles in fact expressing date/time as the number of days and fractions thereof) makes life quite a bit easier.

As for further suggestions: why not make the parser "programmable", that is add a feature that allows dates to be passed in in ways the user prefers, rather than try to do all of this yourself (something akin to Java's DateFormat class).

That way you put the burden of understanding all formats yourself on the shoulders of the users. This also solves ambiguities like:

    1-2-2002

which in most European countries would be interpreted as 1 february 2002, but in others and in the United States, as january 2 2002.

I have no clear idea on how to do that in a simple way (or perhaps, yes, I do: use the format codes that clock already provides and reuse them in a parser/scanner).


ZB

    ambiguities like: 1-2-2002

I've got a feeling, that the simplest way could be to assume the time units order from the date separators used. Have a look at http://www.postgresql.org/docs/7.3/interactive/datatype-datetime.html (Table 5-10. Date Input). In short:

Consider the date 01 February 2003. Such date can be written as:

  • 03-02-01 ("International" notation, ISO-8601)
  • 02/01/03 ("Anglo-saxon" way)
  • 01.02.03 ("German")

...and without "year"-abbreviation:

  • 1-II-2003 ("classic" notation when using "roman digits" for month)
  • 20030201 (ISO-8601 year, month, day)

Pay attention, that in fact the year can be abbreviated as in earlier examples (like: "1-II-03" and "030201"), and there's still no ambiguities "what's what".

I'm not sure, why in that table is the remark about some "ambiguities" when using slash as date separator. Using slash clearly suggests the order MM/DD/yy(YYYY)

Keeping the rules given above, there'll be no problem with any ambiguities.

adavis (22nd August 2007): In the UK the common date format is DD/MM/YY or DD/MM/YYYY. And I would also say - We are the "original" Anglo-Saxons!!

ZB OK, so just one ambiguity remains, and can be easily resolved looking at system locale - either it's en_US (MM/DD/YY) or "rest of the world" (DD/MM/YY).


LV I seem to recall that this package has a LOT of nice time/date/calendar functionality - most of which is in C. However, perhaps there ideas, etc. that could be used for ideas:

 What: Remind
 Where: http://www.roaringpenguin.com/products/remind/
        http://www.roaringpenguin.com/products/remind/remind-03.00.22.tar.gz 
 Description: Remind is an alarm/calendar program which handles Roman
        and Hebrew calendars, sunrise, sunset and moon phases,
        is multilingual, does complicated date calculations (handling
        holidays propers), alarms, includes a WWW calendar server, and produces
        PostScript output.  Uses Tk for an X front end.
        Available for UNIX, MS-DOS, OS/2 and other platforms.
 Updated: 03/2001
 Contact: mailto:[email protected] (David F. Skoll)

Related work, of potential interest: Pythonic normalDate [17 ], astrolabe [16 ], and the widely-lauded mxDateTime [15 ].


nl More related work, of potential interest: Hebrew/Jewish calendar algorithm can be found as "Chelm.org's algorithms of the Jewish calendar" [18 ].


escargo 12 Mar 2004 - Isn't there really an issue of Julian dates that use noon as the time for a new day, versus midnight (customary calendar)? For that matter, some calendars use sunrise or sunset to determine the start of a new day. How would those issues be factored in?

 You're right that the Julian Day Number technically changes
 at noon. Tcl's "Julian Day" is actually the Julian Day Number
 beginning at noon on the given date. (It's convenient for
 astronomers to change the date at noon, they're asleep then
 anyway!)

 I could foresee the Hebraic calendar either being implemented with
 l10n for latitude and longitude or with the nominal date being
 that beginning at sundown on the previous day.  The Hijri calendar
 needs solar time in any case, so there has to be some astronomy
 in any implementation of it.  (It also has to include a disclaimer,
 since in many jurisdictions, the month does not begin until a
 mullah has actually observed the Moon.)

The TLA TAI was used above. Is that Temps Atomique International?

 Yes. Tcl's time model, however, is UTS [http://www.cl.cam.ac.uk/~mgk25/uts.txt].

jmn 2006-06-24 So presumably this means that Tcl is adjusting the output of [clock seconds], [clock milliseconds] etc with a slight offset for a period of 1000 seconds prior to each leap second?

Is this then on top of a separate skewing that may already be occuring as an NTP client adjusts the system clock to keep in sync with the leap second? I've heard for example than some windows NTP clients will start skewing the clock an hour before the leap second.

Do we really have two separate adjustments being made around this time? (I assume the NTP one to the actual system clock, the Tcl 'adjustment' merely being to reported output?)

How on earth then would one compare timestamps generated on a system that uses TAI?

This is a complicated issue - and I think my understanding of it is pretty limited, but what are the possibilities say of extending [clock seconds] etc so that we know exactly what timestamps we're actually dealing with?

e.g

 [[clock seconds UTS]] - presumably the Tcl skewed implementation that exists now?
 [[clock seconds UTC]] - tracks the systems notion of UTC - including ambiguities around leap seconds 
 [[clock seconds TAI]] - I guess it would require leap-second lookup tables if the system clock isn't already directly in TAI 

Anyone care to comment on the feasibility and/or desirability of this?


dzach 2007-8-15 Trying to solve a timing issue, MJ (Tcl Chatroom nick: mjanssen) suggested to use tcl::clock::milliseconds as the fastest command available (in tcl8.5) to retrieve time with millisecond granularity. So in my system, running tcl 8.5a6:

 % time {tcl::clock::milliseconds} 1000
 0.962 microseconds per iteration

which is much better than

 % time {clock milliseconds} 1000
 1.523 microseconds per iteration

but when trying to acquire a fractional unix epoch

 % time {expr {[clock milliseconds]/1000.0}} 1000
 1.618 microseconds per iteration

its performance becomes slower. This little C extension, providing the tcl command fraclock, restores performance to the .9 usec range:

 #include <tcl.h>

 static int
 fraclock_Cmd(ClientData cdata, Tcl_Interp *interp, int objc,  Tcl_Obj * CONST objv[])
 {
         Tcl_Time t;
         Tcl_GetTime(&t);
         Tcl_SetObjResult(interp, Tcl_NewDoubleObj(t.sec + t.usec / 1000000.0));
         return TCL_OK;
 } 

 int DLLEXPORT
 Fraclock_Init(Tcl_Interp *interp)
 {
         if (Tcl_InitStubs(interp, TCL_VERSION, 0) == 0L) {
                 return TCL_ERROR;
         }
         Tcl_CreateObjCommand(interp, "fraclock", fraclock_Cmd, NULL, NULL);
         Tcl_PkgProvide(interp, "fraclock", "1.0");
         return TCL_OK;
 }

Example use:

 % fraclock
 1187180620.302878
 % time {fraclock} 1000
 0.935 microseconds per iteration

Kevin Kenny's original proposal was for a double precision floating point epoch value. The current tcl8.5 clock milliseconds implementation returns an integer value, not a floating point one. Proposal: Although it might be late for that to be in tcl8.5, wouldn't a [clock] (no arguments) format, with the minimum possible overhead, be a possible solution, without breaking existing code, like:

 % clock
 1187180620.302878

KBK Wouldn't our time be more profitably spent on bytecoding ensemble dispatch (which would achieve the same performance gain on all ensembles, not just [clock])? Moreover, wouldn't it be more profitably spent addressing other performance "hot spots," some of which are even hotter?