8

Suppose I have a dictionary

{1:5, 2:5, 4:5}

Is there a data structure such that if I add the key-value pair 3:5, to have it entered in the dictionary so that the keys are in sorted order? i.e.

{1:5, 2:5, 3:5, 4:5}

I am aware of the collections.OrderedDict(), but this only keeps the keys in the order in which they were added (which isn't sufficient for me currently).

I don't want to have to use a normal dictionary dic = {}, then have to use sorted(dic)[0] to grab the smallest key. I'd rather have sorted_dict[0] type function.
The reason for this is that if I use a normal dictionary, I will have to call the sort multiple times, as I am continuously adding pairs to my dictionary.

EDIT: I should have mentioned, it's not only the smallest and largest keys I care about, I will also need to print this dictionary at regular intervals as well...

13
  • with my level of skills, my own class would be even slower than sorted(dict)!! :(
    – Tom Jones
    Commented Mar 19, 2013 at 5:17
  • 2
    What is the use case for which you need this (the actual problem you are trying to solve). Commented Mar 19, 2013 at 5:20
  • a financial market application, the keys will be price, the values will be the volume at that price. as new data comes in i have to update my dictionary, and i want to find out the best and worst price and their volumes multiple times in my script.
    – Tom Jones
    Commented Mar 19, 2013 at 5:24
  • 2
    @RedBaron: That's going to be O(N) each time he needs the smallest key, and O(N) each time he needs the largest, and of course O(N log N) each time he needs to print the whole thing. He explicitly says in the question that this is not acceptable.
    – abarnert
    Commented Mar 19, 2013 at 5:49
  • 3
    specifically, Alex Martelli has a nice answer to this question
    – wim
    Commented Mar 19, 2013 at 5:59

3 Answers 3

5

If you plan to add and remove keys from the dictionary continuously, you really want something that uses an appropriate data structure for the problem—not a hash table (or a hash table plus a list, as with the SortedOrderedDict-type recipes), but a balanced tree (or an equivalent, like a skip list).

If you look around at PyPI, you'll find a number of options. My recommendation would be blist. Even though its data structure may not be quite as optimal as some of the others (because a B+Tree is much broader than a binary tree), it's probably good enough for almost any use case you will run into it. And it has a complete and well-tested interface, including well-tested performance guarantees. And it's used quite a bit by other serious projects.

If you're dealing with one of the rare cases where the tree performance really is critical, you should probably look at the various red-black tree, splay tree, skiplist, etc. implementations. I've used bintrees before, which has a great interface (e.g., you can access the keys and values by index, and even slice the tree, as well as treating it like a dict, and the author has thought through and avoided all the potential ambiguities), but I haven't seriously performance tested it.

Or, if your keys and values really are all smallish integers, you might want to consider using Cython to wrap a C++ map<int, int> in a Pythonic interface. (It's not quite possible to provide a complete interface on top of C++ map, but you often don't need that anyway.) Or, alternatively, modify one of the implementations like bintrees.FastRBTree to store and compare long instead of PyObject*.

On the other hand, if you're just going to create the dictionary all at once and then use it, there's a much simpler answer. Sort it, and stick it in an OrderedDict. Then you don't need anything outside the stdlib.

sorted_dict = collections.OrderedDict(sorted(d.iteritems()))

From a comment on another answer, you say "i don't have permissions to install new modules..."

First, make sure that's really true. You probably do have permission to install modules in a user site-packages directory. Or, if virtualenv is installed and/or you're using 3.3 with built-in venv, even better, you probably have permission to create a venv and install modules into that.

But if so, what you need to do is copy the files from blist/bintrees/whatever into your project.

The problem you might run into is that most of these packages contain C extension modules, which means you have to be able to build them (well, build_ext -i them). If your system doesn't have the Python dev files and a compiler tool chain installed, you can't do that. In that case, you're looking for the best pure-Python solution. bintrees comes with a pure-Python implementation that's identical to the normal C-extension implementation, except slower. It's still O(log N) of course, just the constant factor is a lot higher. If N is big enough, it's still a huge win; if not, it may not be.

If any part of this sounds reasonable, but you need help setting up a per-user site-packages or virtual env, or copying a module into your project in-place, or building extensions in-place, etc., you should probably search for existing questions, and ask a new one if you can't find one (if for no other reason than because the kinds of people who are experts at installation issues aren't necessarily experts at data structures, and may not even be reading this question).

3
  • 2
    yeah, I need to add and remove items in the dictionary continuously, so I'd rather not have to use this sort function every single time (n log n). Will look into your blist solution
    – Tom Jones
    Commented Mar 19, 2013 at 5:34
  • PS, if you use blist for a serious project, email Daniel the author; he likes to keep tabs on how it's being used it so he can push for its including in the stdlib one day (which would obviously make your life easier, as you wouldn't have to install it from PyPI anymore).
    – abarnert
    Commented Mar 19, 2013 at 5:43
  • Wasn't aware of the blist package. After checking it out, definitely a +1.
    – root
    Commented Mar 19, 2013 at 6:02
3

Try this recipe — http://code.activestate.com/recipes/576998-sorted-dictionary/

It keeps keys sorted using stdlib bisect module.

6
  • 1
    This works, but it's not going to be as efficient as using the right data structure for the job.
    – abarnert
    Commented Mar 19, 2013 at 5:41
  • also had a look at the link on this page: stutzbachenterprises.com/blist/sorteddict.html. Good, though I'm at work, and I can't use this module there...
    – Tom Jones
    Commented Mar 19, 2013 at 5:49
  • @XinLiang: That's the same blist I recommended. Why can't you use this module? (If you mean you can't use the recipe from ActiveState because of the ambiguous licensing… I've run into that problem before with some employers' legal departments, even though I'm pretty sure that the explicit MIT license overrides the AS license. But anyway, blist and bintrees are BSD-licensed and MIT-licensed, respectively, so that's not a problem.
    – abarnert
    Commented Mar 19, 2013 at 5:50
  • i don't have permissions to install new modules... Also, fwiw, I'm quite new to python (3 weeks experience)
    – Tom Jones
    Commented Mar 19, 2013 at 5:56
  • @XinLiang: I've edited my answer to deal with that. But I think we're getting off-topic on comments for this answer.
    – abarnert
    Commented Mar 19, 2013 at 6:20
1

More than a year late to the party but I wanted to suggest the sortedcontainers module. Like blist and bintrees, it provides a SortedDict data type that maintains the keys in sorted order. Unlike those modules it's written in pure-Python and is actually faster. SortedDict also supports indexing. Looking up the min/max actually happens in O(1) time.

Since it's pure-Python, installation with pip should be a breeze:

pip install sortedcontainers

Then you can simply import the SortedDict

In [1]: from sortedcontainers import SortedDict

In [2]: d = SortedDict({1:5, 2:5, 4:5})

In [3]: d
Out[3]: SortedDict({1: 5, 2: 5, 4: 5})

In [4]: d[3] = 5

In [5]: d
Out[5]: SortedDict({1: 5, 2: 5, 3: 5, 4: 5})

If you have difficulty installing things using pip or can't copy files that'll need compiling then you can just pull the sortedlist.py and sorteddict.py files out of the depot. All the code is open source on github.

The sortedcontainers module also provides a performance comparison with the most popular suggestions benchmarked against one another.

Not the answer you're looking for? Browse other questions tagged or ask your own question.