68

I'm wondering if there is a standard or "normal" means of interpreting time interval data end points with respect to inclusiveness/exclusiveness of the value defining the end point. Note however that I am asking what the standard (or most common) convention is (if there is one), not for a dissertation on your personal preference. If you really want to provide a dissertation, please attach it to a reference to someone's published standard or a standard text on the matter. Open standards (that I don't have to pay to read) are greatly preferred unless they are fundamentally flawed :).

Of course there are 4 possibilities for a time interval from A to B:

  1. (A, B) - Both ends are exclusive.
  2. [A, B] - Both ends are inclusive.
  3. [A, B) - Start is inclusive and end is exclusive
  4. (A, B] - Start is exclusive and end is inclusive

Each of these has different characteristics (as I see it, feel free to point out more)

The [A, B] convention would have the seemingly inconvenient property that B is contained withing the inteval [A, B] and also [B, C]. This is particularly inconvenient if B is meant to represent the midnight boundary and you are trying to determine which day it falls on for example. Also, this means the duration of the interval is slightly irritatig to calculate since [A, B] where A = B should have a length of 1 and therefore the duration of [A, B] is (B - A) + 1

Similarly the (A, B) convention would have the difficulty that B falls within neither (A,B) nor (B,C)... continuing the analogy with day boundaries, midnight would be part of neither day. This is also logically inconvenient because [A, B] where A = B is a non-sense interval with duration less than zero, but reversing A and B does not make it a valid interval.

So I think I want either [A, B), or (A, B] and I can't figure out how to decide between them.

So if someone has a link to a standards document, reference to a standard text or similar that clarify the convention that would be great. Alternately, if you can link a variety of standards documents and/or references that more or less completely fail to agree, then I can just pick one that seems to have sufficient authority to CMA and be done with it :).

Finally, I will be working in Java, so I am particularly susceptible to answers that work well in Java.

5
  • 1
    What do you mean by "inclusive"? Time is continuous, we just happen to discretise it for the purposes of computation. If we're working in hours, then today runs from 00 to 23 (inclusive). But that's wrong as soon as we increase the resolution; today is from 00:00 to 23:59 (inclusive), and so on... Commented Mar 20, 2012 at 21:43
  • 1
    Every project I have ever worked on this is determined by the business conventions of the application domain, not abstract conventions of programming.
    – Affe
    Commented Mar 20, 2012 at 21:44
  • 1
    @Oli Exactly the point. Obviously I must use a discrete representation, and that's unnatural. The question centers around the best way to deal with the mismatch.
    – Gus
    Commented Mar 20, 2012 at 21:44
  • 1
    @Gus: Then this is not unique to time. This is the same problem for any quantity that we need to discretise. Commented Mar 20, 2012 at 21:46
  • @Affe I happen to be in a situation where I am asked to propose something and I don't have good reason to believe that there is a well established business convention. If I turn out to be wrong, of course, I'll change.
    – Gus
    Commented Mar 20, 2012 at 21:47

6 Answers 6

81

In the general case, [A, B) (inclusive start, exclusive end) has a lot going for it and I don't see any reason why the same wouldn't be true for time intervals.

Djikstra wrote a nice article about it Why numbering should start at zero which - despite the name - deals mostly with exactly this.

Short summary of the advantages:

  • end - start equals the number of items in the list
  • upper bound of preceding interval is the lower bound of the next
  • allows to index an interval starting from 0 with unsigned numbers [1]

Personally the second point is extremely useful for lots of problems; consider a pretty standard recursive function (in pseudo python):

def foo(start, end):
    if end - start == 1:
        # base case
    else:
        middle = start + (end - start) / 2
        foo(start, middle)
        foo(middle, end)

Writing the same with inclusive upper bound introduces lots of error prone off by one errors.

[1] That's the advantage compared to (A, B] - a interval starting from 0 is MUCH more common than an interval ending in MAX_VAL. Note that also relates to one additional problem: Using two inclusive bounds means we can denote a sequence whose length cannot be expressed with the same size.

1
  • 3
    I completely agree with [A,B) being the best, especially with regard to the midnight problem. 2014-01-01 00:00:00 - 2014-01-02 00:00:00, when interpreted using [A,B), includes the entirety of January 1st. This properly excludes 2014-01-02 00:00:00, which falls on January 2nd. Commented Apr 22, 2014 at 21:01
12

tl;dr

  1. [A, B) - Start is inclusive and end is exclusive

That's the one, known as Half-Open.

Examples:

  • School lunch starts when the clock strikes noon. Class resumes when the clock reaches 1 PM.
  • A day starts at the first moment of the day, running up to, but not including, the first moment of the next day.
  • A month starts on the first, running up to, but not including, the first of the following month.

java.time & Half-Open

The java.time classes that supplant the troublesome legacy date-time classes as well as the Joda-Time project define a span-of-time using the Half-Open approach [) where the beginning is inclusive while the ending is exclusive.

For date-time with a fractional second this eliminates the problem of trying to capture last moment. The infinitely-divisible last second must be resolved, but various systems use various granularities such as milliseconds, microseconds, nanoseconds, or something else. With Half-Open, a day, for example, starts at the first moment of the day and runs up to, but does not include, the first moment of the following day. Problem solved, no need to wrestle with last moment of the day and its fractional second.

I have come to see the benefits of using this approach consistently throughout all my date-time handling code. A week for example starting on a Monday runs up to, but does not include, the following Monday. A month starts on the 1st and runs up to, but does not include, the first of the following month thereby ignoring the challenge of determining the number of the last day of the month including Feb 28/29 Leap Year.

Another benefit of consistent use of Half-Open [) is the easing the cognitive load every time I have to detect and decipher and verify a piece of code’s span-of-time approach. In my own programming, I simply glance for a mention of Half-Open in a comment at top and I instantly know how to read that code.

A result of consistent use of Half-Open is reducing the chance of bugs in my code as my thinking and writing style are uniform with no chance of getting confused over inclusive-exclusive.

By the way, note that Half-Open [) means avoiding the SQL BETWEEN conjunction as that is always fully-closed [].

As for the business thinking of the customers I serve, where appropriate I try to convince them to use Half-Open constantly as well. I've seen many situations where various business people were making incorrect assumptions about the periods of time covered in reports. Consistent use of Half-Open avoids these unfortunate ambiguities. But if the customer insists, I note this in my code and adjust inputs/outputs so as to use Half-Open within my own logic. For example my logic uses a week of Monday-Monday, but on a report subtract a day to show Sunday.

For even more classes representing spans of time with the Half-Open approach [), see the ThreeTen-Extras project for its Interval class (a pair of Instant objects) and the LocalDateRange class (a pair of LocalDate objects).

Tip: When printing/displaying reports for business, include a footer that describes the query logic including the detail of the beginning/ending be inclusive/exclusive. I have seen way too much confusion on this in the workplace, with readers making incorrect assumptions about the date ranges (and other criteria).


About java.time

The java.time framework is built into Java 8 and later. These classes supplant the troublesome old legacy date-time classes such as java.util.Date, Calendar, & SimpleDateFormat.

To learn more, see the Oracle Tutorial. And search Stack Overflow for many examples and explanations. Specification is JSR 310.

The Joda-Time project, now in maintenance mode, advises migration to the java.time classes.

You may exchange java.time objects directly with your database. Use a JDBC driver compliant with JDBC 4.2 or later. No need for strings, no need for java.sql.* classes. Hibernate 5 & JPA 2.2 support java.time.

Where to obtain the java.time classes?

5
  • Excellent discussion and update WRT newer libraries :)
    – Gus
    Commented Aug 5, 2017 at 12:51
  • Representing time as half-open feels like the simplest way to to reason about time. I'm finding it hard to find any source documentation, but what does ISO 8601 say about interval boundaries? Does it specify that they are right-open?
    – mkobit
    Commented Apr 3, 2019 at 15:07
  • @mkobit I have not read the ISO 8601 standard, as it is copyrighted, rather expensive, and not easily obtained. If you look at the Time Intervals portion of the Wikipedia page, its shows multiple examples that seem to be in Half-Open semantics. I suspect that is not explicitly stating Half-Open though — it is simply natural to say that a half-hour meeting at 3 PM ends at 15:30 rather than 15.29.999999999. Commented May 21, 2021 at 5:24
  • 1
    Thanks @BasilBourque - when I was trying to look into this a bit more, I too ran into the troubles in looking at ISO8601 you described. I decided to move forward with half-open with right-openness, and it made the system we built simple to reason about!
    – mkobit
    Commented May 21, 2021 at 18:54
  • Similarly to using UTC under the hood, and presenting timezoned datetimes to the user. You also have to consider that when you write down a range that is either a date or greater (week, month, year), you must transform the exclusive part into inclusive (-1s). Because when someone says working days are from "monday to friday", 5 days are meant, thus including friday. That holds true to for anything except time: "10:00 to 10:30" means until or up to, and thus excluding 10:30 itself. When the clock shows 10:30 it's already past that indicated time range.
    – Yeti
    Commented Feb 19 at 8:23
6

I'll provide what I wrote for our team as an answer using Voo's link until such time as Voo adds an answer, then I'll give him credit instead. Here's what I decided for our case:

Time intervals in our applications will be represented as a pair of instantaneous times with the convention that the start time is inclusive and the end time is exclusive. This convention is mathematically convenient in that the difference of the bounds is equal to the length of the interval, and is also numerically consistent with the way arrays and lists are subscripted in java programs (see http://www.cs.utexas.edu/~EWD/ewd08xx/EWD831.PDF). The practical upshot of this is that interval 2012-03-17T00:00:00.000Z – 2012-03-18T00:00:00.000Z denotes the entirety of St. Patrick’s Day, and every date beginning with 2012-03-17 will be identified as included in St Patrick’s Day, but 2012-03-18T00:00:00.000Z will not be included, and St Patrick’s Day will include exactly 24*60*60*1000 milliseconds.

5

Despite this thread focusing more on Java, I thought it'd be quite interesting to see other adopted conventions, especially given that the pandas Python library is ubiquitous for data analysis these days, and the fact that this StackOverflow page is one of the top search results when looking for conventions on the inclusivity/exclusivity of time ranges.

Quoting this page:

The start and end dates are strictly inclusive. So it will not generate any dates outside of those dates if specified.

Also, it's not only generating date ranges. The convention is also adopted when trying to index into time-series data. Here's a simple test on data frames with DatetimeIndex

>>> import pandas as pd
>>> pd.__version__
'0.20.2'
>>> df = pd.DataFrame(list(range(20)))
>>> df.index = pd.date_range(start="2017-07-01", periods=20)
>>> df["2017-07-01":"2017-07-05"]
            0
2017-07-01  0
2017-07-02  1
2017-07-03  2
2017-07-04  3
2017-07-05  4
4

I can't say for certain, but I doubt a standard or convention exists. Whether or not you include the start or end instant would depend on your use case, so consider whether they are important to you. If the decision is arbitrary, pick one, note that the choice is arbitrary and move on.

As for what is supported in Java, the Joda Time library implements Intervals that include the start time but not the end time

5
  • +1: Just use Joda Time :). May also help to see what a digital clock does: times are labelled according to (and including) the start of the interval. Commented Mar 20, 2012 at 21:49
  • 1
    While I might possibly accept Joda Time's convention if there are no other better answers, I'm not actually looking to add libraries to the project unless I actually have to... :)
    – Gus
    Commented Mar 20, 2012 at 21:57
  • If this is the first library that your project has needed, adding it can be a pain (depending on your development and deployment setups) so I can understand your hesitation. If you already any libraries, Joda Time is well worth using. The API for dealing with dates is far better than the java.util package. Libraries let you concentrate on your application by implementing the common or difficult stuff for you.
    – sgmorrison
    Commented Mar 20, 2012 at 22:05
  • 4
    I don't see why time intervals should be handled any different than other intervals. In the general case [A, B) has a LOT going for it - djikstra wrote a nice paper summarizing the different approaches and why inclusive, exclusive is the best. Also his penmanship is really excellent
    – Voo
    Commented Mar 20, 2012 at 22:14
  • @Voo Nice, this link is precisely the sort of thing I was looking for. A clear exposition on a fundamental basis by a person or group of known expertise. Clarifies my thinking, and covers my butt :), care to turn it into an answer so I can give you credit?
    – Gus
    Commented Mar 21, 2012 at 14:28
3

I have just been through this exact same thought process and i think it is very important that this is standardised in some way, or at least clarified by means of these types of Q&A posts!

In our case the date ranges in question are used as inputs and outputs to / from a microservice; one that, in the short-term at least, will be called by an existing monolithic application (it's a monolith decomposition project). Therefore, i think that the comment above relating to the decision being driven by business requirements is, in our case, less relevant (because the direct "users" of the software we're building are really technical people). If we were handling the input from a date picker that might be a different story!

My recommendation was that all start dates are inclusive and all end dates are exclusive - so [A,B) in your notation. This was for the following reasons:

  1. We had previously agreed that any incoming dates containing time parts would be rejected (even if the JSON value was "2018-01-01T00:00:00") and that we'd output all dates without times. Therefore, if the end date is exclusive, as soon as the string is deserialized into the .NET DateTime object, it would be a day out.

  2. I like the idea that date ranges (which in our case should always yield whole days) can can always be calculated by simply doing dateRange = (endDateExcl - startDateIncl).TotalDays. No need to add 1 everywhere!

  3. Much of the business validation performed by the service is checking that multiple data ranges are flush against each other without gaps. This is easy to eye check when using [A,B) because each B should match the preceding A. If we go with [A,B] then we (devs, testers, support engineers) would often be asking ourselves "How many days is in March again?" (e.g. [2018-03-01,2018-03-30],[2018-04-01,2018-04-30]) or "Does 2016 have a leap day?" (e.g. [2016-02-01,2016-02-28],[2016-03-01,2016-03-30]).

Just to add, i strongly recommend anyone, regardless of decision, to explicitly suffix all attribute names, variables, methods or otherwise with "Incl" or "Excl" so that it is clear to everyone without having to hunt out documentation!

We have also recommended that all dates should come in in ISO format and that anything with a "Z" on the end should also be rejected (because the understanding is that we're working in whole days and we don't want a date to be deserialized into a DateTime object with a rogue hour (or 23!) because of daylight saving).

Footnote, i would have probably posted this as a comment to Voo's answer but i've just (belatedly!) joined SO and need to earn my kudos before i can do that! ;-)

Happy dating x

2
  • 1
    If you are working in whole days, then the standard ISO 8601 format would be YYYY-MM-DD with no time-of-day at all, so your mentioned issue of Z at the end should be moot. But keep in mind that dates are affected by time zone as well. For any given moment, the date varies around the globe by zone. A new day dawns earlier in the east. So if it is important to your business problem to mark the precise beginning/ending of the day, then you should be using a date-time either with an offset/zone or adjusted into UTC and marked with a Z. Commented Nov 3, 2017 at 21:12
  • And while noon UTC seems like it would be the same date everywhere, that's not true, there are in fact 27 timezones... all the way out to UTC+14... granted pacific islands are not often a major use case, but very good to be aware of. One just can't escape the need for specifying timezones unless some error cases can be tolerated.
    – Gus
    Commented Apr 25, 2018 at 21:48

Not the answer you're looking for? Browse other questions tagged or ask your own question.