18

I am looking for a nice way of flattening (splitting) a list of potentially-overlapping numeric ranges. The problem is very similar to that of this question: Fastest way to split overlapping date ranges, and many others.

However, the ranges are not only integers, and I am looking for a decent algorithm that can be easily implemented in Javascript or Python, etc.

Example Data: Example data

Example Solution: enter image description here

Apologies if this is a duplicate, but I am yet to find a solution.

4
  • How do you determine that green is on top of blue, but under yellow and orange? Are the color ranges applied in order? If that's the case, the algorithm seems obvious; just ...erm, apply the color ranges in order. Commented May 29, 2014 at 3:04
  • 1
    Yes, they are applied in order. But that's the problem—how would you 'apply' the ranges?
    – Jollywatt
    Commented May 29, 2014 at 3:16
  • 1
    Do you often add/remove colors, or do you need to optimize for query speed? How many "ranges" will you usually have? 3? 3000?
    – Telastyn
    Commented May 29, 2014 at 20:52
  • Won't be adding/removing colours very frequently, and there'll be anywhere between 10-20 ranges, with 4+ digit precision. That's why the set method isn't quite suitable, because the sets will have to be 1000+ items long. The method I've gone with is the one I posted in Python.
    – Jollywatt
    Commented May 30, 2014 at 0:22

5 Answers 5

10

Walk from left to right, using a stack to keep track of what color you're on. Instead of a discrete map, use the 10 numbers in your dataset as the break-points.

Starting with an empty stack, and setting start to 0, loop until we reach the end:

  • If stack is empty:
    • Look for the first color starting at or after start, and push it and all lower-ranked colors onto the stack. In your flattened list, mark the beginning of that color.
  • else (If not empty):
    • Find next beginning point for any higher-ranked color at or after start, and find the end of the current color
      • If the next color starts first, push it and anything else on the way to it onto the stack. Update the end of the current color as the beginning of this one, and add the start of this color to the flattened list.
      • If there are none and the current color ends first, set start to the end of this color, pop it off of the stack, and check the next-highest ranked color
        • If start is within the next color's range, add this color to the flattened list, starting at start.
        • If the stack empties, just continue the loop (go back to the first bullet point).

This is a mental run-through given your example data:

# Initial data.
flattened = []
stack = []
start = 0
# Stack is empty.  Look for the next starting point at 0 or later: "b", 0 - Push it and all lower levels onto stack
flattened = [ (b, 0, ?) ]
stack = [ r, b ]
start = 0
# End of "b" is 5.4, next higher-colored start is "g" at 2 - Delimit and continue
flattened = [ (b, 0, 2), (g, 2, ?) ]
stack = [ r, b, g ]
start = 2
# End of "g" is 12, next higher-colored start is "y" at 3.5 - Delimit and continue
flattened = [ (b, 0, 2), (g, 2, 3.5), (y, 3.5, ?) ]
stack = [ r, b, g, y ]
start = 3.5
# End of "y" is 6.7, next higher-colored start is "o" at 6.7 - Delimit and continue
flattened = [ (b, 0, 2), (g, 2, 3.5), (y, 3.5, 6.7), (o, 6.7, ?) ]
stack = [ r, b, g, y, o ]
start = 6.7
# End of "o" is 10, and there is nothing starting at 12 or later in a higher color.  Next off stack, "y", has already ended.  Next off stack, "g", has not ended.  Delimit and continue.
flattened = [ (b, 0, 2), (g, 2, 3.5), (y, 3.5, 6.7), (o, 6.7, 10), (g, 10, ?) ]
stack = [ r, b, g ]
start = 10
# End of "g" is 12, there is nothing starting at 12 or later in a higher color.  Next off stack, "b", is out of range (already ended).  Next off stack, "r", is out of range (not started).  Mark end of current color:
flattened = [ (b, 0, 2), (g, 2, 3.5), (y, 3.5, 6.7), (o, 6.7, 10), (g, 10, 12) ]
stack = []
start = 12
# Stack is empty.  Look for the next starting point at 12 or later: "r", 12.5 - Push onto stack
flattened = [ (b, 0, 2), (g, 2, 3.5), (y, 3.5, 6.7), (o, 6.7, 10), (g, 10, 12), (r, 12.5, ?) ]
stack = [ r ]
start = 12
# End of "r" is 13.8, and there is nothing starting at 12 or higher in a higher color.  Mark end and pop off stack.
flattened = [ (b, 0, 2), (g, 2, 3.5), (y, 3.5, 6.7), (o, 6.7, 10), (g, 10, 12), (r, 12.5, 13.8) ]
stack = []
start = 13.8
# Stack is empty and nothing is past 13.8 - We're done.
4
  • what do you mean by "anything else on the way to it onto the stack" ? Commented Jan 15, 2019 at 12:17
  • 1
    @Guillaume07 Anything of ranks between the current and the chosen next start. The sample data doesn't show it, but imagine the yellow was shifted to start before green - you have to push both green and yellow onto the stack so that when yellow ends, the end of green is still at the right place in the stack so it still shows up in the final result
    – Izkata
    Commented Jan 15, 2019 at 15:18
  • Another think I don't understand, please, is why you tell firstly "If stack is empty: Look for the first color starting at or before start," then in the code sample you comment "# Stack is empty. Look for the next starting point at 0 or later". So once it is before and once it is later Commented Jan 15, 2019 at 15:36
  • 1
    @Guillaume07 Yep, a typo, the correct version is in the code block twice (the second being the comment near the bottom that starts "Stack is empty."). I've edited that bullet point.
    – Izkata
    Commented Jan 16, 2019 at 4:38
3

This solution seems the simplest. (Or at least, the easiest to grasp)

All that is needed is a function to subtract two ranges. In other words, something that will give this:

A ------               A     ------           A    ----
B    -------    and    B ------        and    B ---------
=       ----           = ----                 = ---    --

Which is simple enough. Then you can simply iterate through each of the ranges, starting from the lowest, and for each one, subtract from it all the ranges above it, in turn. And there you have it.


Here is an implementation of the range subtractor in Python:

def subtractRanges((As, Ae), (Bs, Be)):
    '''SUBTRACTS A FROM B'''
    # e.g, A =    ------
    #      B =  -----------
    # result =  --      ---
    # Returns list of new range(s)

    if As > Be or Bs > Ae: # All of B visible
        return [[Bs, Be]]
    result = []
    if As > Bs: # Beginning of B visible
        result.append([Bs, As])
    if Ae < Be: # End of B visible
        result.append([Ae, Be])
    return result

Using this function, the rest can be done like this: (A 'span' means a range, as 'range' is a Python keyword)

spans = [["red", [12.5, 13.8]],
["blue", [0.0, 5.4]],
["green", [2.0, 12.0]],
["yellow", [3.5, 6.7]],
["orange", [6.7, 10.0]]]

i = 0 # Start at lowest span
while i < len(spans):
    for superior in spans[i+1:]: # Iterate through all spans above
        result = subtractRanges(superior[1], spans[i][1])
        if not result:      # If span is completely covered
            del spans[i]    # Remove it from list
            i -= 1          # Compensate for list shifting
            break           # Skip to next span
        else:   # If there is at least one resulting span
            spans[i][1] = result[0]
            if len(result) > 1: # If there are two resulting spans
                # Insert another span with the same name
                spans.insert(i+1, [spans[i][0], result[1]])
    i += 1

print spans

This gives [['red', [12.5, 13.8]], ['blue', [0.0, 2.0]], ['green', [2.0, 3.5]], ['green', [10.0, 12.0]], ['yellow', [3.5, 6.7]], ['orange', [6.7, 10.0]]], which is correct.

2
  • Your output at the end doesn't match expected output in the question...
    – Izkata
    Commented Dec 23, 2016 at 3:43
  • @Izkata Gosh, I was careless. That must have been the output from another test. Fixed now, thanks
    – Jollywatt
    Commented Dec 23, 2016 at 4:12
2

If the data really is similar in scope to your sample data, you could create a map like this:

map = [0 .. 150]

for each color:
    for loc range start * 10 to range finish * 10:
        map[loc] = color

Then just walk through this map to generate the ranges

curcolor = none
for loc in map:
    if map[loc] != curcolor:
        if curcolor:
            rangeend = loc / 10
        make new range
        rangecolor = map[loc]
        rangestart = loc / 10

To work, the values have to be in a relatively small range as in your sample data.

Edit: to work with true floats, use the map to generate a high level mapping and then refer to the original data to create the boundaries.

map = [0 .. 15]

for each color:
   for loc round(range start) to round(range finish):
        map[loc] = color

curcolor = none
for loc in map
    if map[loc] != curcolor:

        make new range
        if loc = round(range[map[loc]].start)  
             rangestart = range[map[loc]].start
        else
             rangestart = previous rangeend
        rangecolor = map[loc]
        if curcolor:
             if map[loc] == none:
                 last rangeend = range[map[loc]].end
             else
                 last rangeend = rangestart
        curcolor = rangecolor
2
  • This is a very nice solution, I have come across it before. However, I am looking for an more generic solution that can manage any arbitrary float ranges... (this wouldn't be the best for something like 563.807 - 770.100)
    – Jollywatt
    Commented May 29, 2014 at 2:52
  • 1
    I think you could generalize it by rounding the values, and generating the map, but marking a location on the edges as having two colors. Then when you see a location with two colors, go back to the original data to determine the boundary.
    – user53141
    Commented May 29, 2014 at 3:10
2

Here's a relatively straightforward solution in Scala. It shouldn't be too difficult to port to another language.

case class Range(name: String, left: Double, right: Double) {
  def overlapsLeft(other: Range) =
    other.left < left && left < other.right

  def overlapsRight(other: Range) =
    other.left < right && right < other.right

  def overlapsCompletely(other: Range) =
    left <= other.left && right >= other.right

  def splitLeft(other: Range) = 
    Range(other.name, other.left, left)

  def splitRight(other: Range) = 
    Range(other.name, right, other.right)
}

def apply(ranges: Set[Range], newRange: Range) = {
  val left     = ranges.filter(newRange.overlapsLeft)
  val right    = ranges.filter(newRange.overlapsRight)
  val overlaps = ranges.filter(newRange.overlapsCompletely)

  val leftSplit  =  left.map(newRange.splitLeft)
  val rightSplit = right.map(newRange.splitRight)

  ranges -- left -- right -- overlaps ++ leftSplit ++ rightSplit + newRange
}

val ranges = Vector(
  Range("red",   12.5, 13.8),
  Range("blue",   0.0,  5.4),
  Range("green",  2.0, 12.0),
  Range("yellow", 3.5,  6.7),
  Range("orange", 6.7, 10.0))

val flattened = ranges.foldLeft(Set.empty[Range])(apply)
val sorted = flattened.toSeq.sortBy(_.left)
sorted foreach println

apply takes in a Set of all the ranges that have already been applied, finds the overlaps, then returns a new set minus the overlaps and plus the new range and the newly split ranges. foldLeft repeatedly calls apply with each input range.

0

Just keep a set of ranges sorted by start. Add range that covers everything (-oo..+oo). To add a range r:

let pre = last range that starts before r starts

let post = earliest range that starts before r ends

now iterate from pre to post: split ranges that overlap, remove ranges that are covered, then add r

Not the answer you're looking for? Browse other questions tagged or ask your own question.