Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is an amazing contribution to the language. A mixture of speed and convenience, probably made by volunteers. As for people criticizing a change to what use to be a non-deterministic ordering of a dict iteration; I don't know what to say to them, other than, are you serious? There are people out there who are working for us, they work for free and they did some heavy lifting to give us this. They might read what you wrote and think, "Why bother? Maybe I should spend my weekends playing with my kids instead."


> As for people criticizing a change to what use to be a non-deterministic ordering of a dict iteration; I don't know what to say to them, other than, are you serious? There are people out there who are working for us, they work for free and they did some heavy lifting to give us this. They might read what you wrote and think, "Why bother? Maybe I should spend my weekends playing with my kids instead."

I don't agree with the complaints about ordering ducts, but this is a ludicrous response. Appreciation of volunteers' work on Python isn't diminished by criticism of language features or changes. In fact, it enhances it: if people have opinions about the project you work on, it's a good sign that it's important, significant work. I've contributed to OSS projects before, and the ones in use by more than just my friends and I were naturally the ones that felt the most meaningful.


As it’s widely known, more often than not “criticism” of open source software quickly devolves into hate and toxicity. Helpful criticism is great but be careful with defending the “criticism culture” around foss, it’s often angry unhappy people that want it all for free on a golden platter, and yesterday.


Sure, hate and toxicity themselves should be called out. But that's not what's happening here, and not what GP comment was referring to. It doesn't make any sense to throw out the baby of legitimate criticism with the bathwater of toxicity.


That does not matter, nothing is above criticism.


It does matter. If people get hate for doing work on open source they will stop doing work on open source. Yes you can be critical of everything, but the hate that some people spew toward certain open projects is not only nauseating but actively damaging to the community.


As far as I know, it was implemented by Raymond Hettinger. This is a very interesting talk about the new tech:

https://www.youtube.com/watch?v=p33CVV29OG8&t=1202s


Indeed, here is the mail from Raymond Hettinger explaining the new design on the python-dev list: https://mail.python.org/pipermail/python-dev/2012-December/1... Quite amazing that it is faster, more memory efficient and convenient for the end user.


Raymond came up with the idea, PyPy implemented it, and then INADA Naoki implemented it for CPython.


If I remember correctly his talk. He went to CPython first and it was rejected, but PyPy welcomed it.


It was proposed by Raymond Hettinger, but was actually first implemented by PHP and Pypy.


Other languages don't generally have special order guarantees about standard maps. This seems very idiosyncratic.


For almost all intents and purposes, object keys have been ordered in JavaScript since ES2015. Map has always been ordered.

https://www.stefanjudis.com/today-i-learned/property-order-i...


Users of JSON, probably the most common data interchange format on the planet, frequently have implicit requirements about key ordering. It is highly convenient to be able to parse a JSON string into a native Python data structure, add a field, emit it back, and preserve the ordering.


in what reasonable use case would the order of the properties on an object matter? I can't think of one


when you are diffing the serialized output?


From what I recall of the JSON standard itself, there's no guarantee about key ordering being significant. If you're diffing serialized output to compare two JSON objects you need to be serializing it in a consistent format, otherwise even whitespace is going to throw you off.


It's significant to a human that wants to know what has changed.


If a human is inspecting serialized JSON using pen and paper, the human is presumably clever enough to match up key for key regardless of ordering.

If the human is using a computer to compare two JSON payloads (as the use of a diffing algorithm suggests), the human and computer should be clever enough as a team to realize that they could just deserialize and reserialize each JSON payload such that the keys were lexicographically sorted and the data was pretty-printed in the exact same way before running it through the diffing algorithm. `jq -Sc` would do the trick.


Most diff programs don't have what you describe. And in a lot of cases you don't have the easy ability to "do stuff" before running the input through a diffing algorithm.


> Most diff programs don't have what you describe.

That’s why pipes were invented.

> And in a lot of cases you don't have the easy ability to "do stuff" before running the input through a diffing algorithm.

Well, in that case you’re not going to be able to meaningfully compare two JSON payloads because neither order nor white space have any semantic meaning in JSON. I’m really curious what you’re talking about though, since if you’re working on the shell you can easily use jq to do as I describe.


So you can sort the keys at serialization time rather than paying a performance penalty all the time?


I tracked down a bug once where someone was hashing state that happened to include Python dicts serialized to JSON, which caused different hash values for the same entity. I know, I know, insert tears of blood emoji, and any engineer worth his salt wouldn't be so sloppy. But you don't always get to choose to work only with top talent that understands why you should design things elegantly as a baseline[1]. In these cases, removing implicit gotchas like "the same code has the same behavior" (ie iteration over the same dict) is valuable.

[1] Well you _can_ choose to do so, and I recently have, but it's not without its costs.


I can't speak for all "other" languages but Ruby made the change years ago from 1.8 to 1.9. Ruby calls them hashes but they're standard maps. Obviously Ruby and Python are very similar but I think the point stands.


Golang map iteration is returned in random order specifically to make sure that people don't rely on the order.

I think this feature says a lot about the philosophy of Python vs Go.


Sounds like a fast and idiomatic way to shuffle a deck of cards is then to convert to a map and back.


Convert a deck of cards ([]Card?) to a map (map[Card]bool?) and back just to shuffle? That's unlikely to be faster or more idiomatic than a straightforward implementation of the Fisher-Yates shuffle[1]. Try writing the code to do it both ways and compare.

[1]: https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle


I think "idiomatic" would be using rand.Perm, which implements the Fisher%E2%80%93Yates shuffle. But aside from whether it's idiomatic, converting to a map isn't random enough.


With Golang 1.11, I get orderings like this. S is spades, H is hearts, D is diamonds, and C is clubs, because HN eats the Unicode characters I was actually using.

     S9  SJ  SK  HQ  D9  S4  S5 S10  HJ  S8  H6  DA  D2
     D4  DQ  C6  C8  CJ  H3  H9  DK  C3  CQ  SA  S2  S3
     S6  SQ  H4 H10  D8 D10  C4  CK  S7  H7  D5  D6  D7
     C7  H2  HK  D3  DJ  C2  C5  C9  HA  H5  H8  CA C10
On average you would expect about one card in a shuffled deck to be (cyclically) followed in the deck by its succeeding card, and on about one out of 50 shuffles, that card would be followed by its successor. Here we have S4 S5, DA D2, SA S2 S3, and D5 D6 D7.

This is very strong evidence of bias.

I'm interested to hear if my Golang can be made more idiomatic, or if there are bugs in it: http://canonical.org/~kragen/sw/dev3/mapshuffle.go

In particular it seems like there ought to be a less verbose way to express the equivalent of the Python list({card: True for card in deck}) in Golang.


No, it isn't random enough for that.


IIRC it used to just start iteration at a pseudorandom index and then iterate normally. I looked at it couple years ago, don't know if they changed it.


It seems to do that, but I think it also tweaks the hash function for each newly created map, because in the code linked from https://news.ycombinator.com/item?id=22278753 I'm not getting rotations of the same iteration order when I generate two maps. There doesn't seem to be any randomness in mapiternext() itself.

You could imagine that the hash function itself might do an adequate job of randomizing the order of the cards, though, especially if salted with a per-map salt. SipHash, for example, would probably not have any detectable biases in the distribution of the permutations thus produced. But whatever hash function Golang is using for my structs has an easily visible bias, as described in that comment.


What does it say? That Python is willing to spend more computation time for simplicity, without waiting for the programmer to ask for it? That Python will make semantic changes in minor version bumps of the language?


The new-ish ordered dictionary is actually faster. It is also completely backward compatible for any but the most contrived example.

Now wasting computation on randomizing... that is a problem.


The ordering property is a side effect of the new and more cpu/memory efficient data structure. It would be very surprising to say no to a 2x performance jump to avoid the (sometimes useful) ordering.


Tcl's dicts are ordered.


Java added LinkedHashMap a while ago. I tend to default to it except for small temporary use cases where order absolutely won't matter or have an outside impact...


LinkedHashMap is not Java's "standard maps" though. If you tend to use it by default it's really a personal idiosyncrasy, especially as LLHM tend to be larger and slower than regular hashmaps due to having to maintain a doubly linked list.

Python has had one such in the standard library for a decade or so.


LHM is not slower for iteration (it's faster actually).

LHM indeed pays 2 references per node but they are well worth as it has deterministic ordering/iteration and I have witnessed numerous cases with HashMap that show up in production only due to iteration order (esp after rehashing). The code is broken but the cases did not reproduce during testing...

Now if the two extra references are an issue, consider that HashMap is quite space inefficient with having a dedicated node per each entry - that's the main price. The (simple) node memory footprint is 36bytes on heaps less than 32GB, i.e. compact pointers. The extra references add another 8 bytes for having an insert or access order.

If the goal is getting a compact low memory footprint, HashMap is not fit for purpose. Overall it's the jack of all trades and even got reworked (java8) to support tri-based collision resolution for keys that implement Comparable.

Couple years back I wrote CompactHashMap (under CC0) that has an average cost of 10bytes per entry and it's extremely compact for small sizes with having only 2 fields on its right own, so even small/empty maps are tiny. In microbenchmarks (same used in openjdk) it handily (2x) beats java.util.HashMap on "Traverse key or value", get/put have similar performance, and "Search Absent" is worse.

The point is: LHM should be the go-to hashmap for java as being node based hashmap is justified (unlike java.util.HashMap)


Yep. Also, LinkedHashMap maintains ordering based on insertion order. So, iff you insert your data ordered how you want it, it's behaving as an ordered map. If you want true, self-reordering map, what you want is a TreeMap. Beauty with that one is that you have full control over defining the custom ordering function, because we're not always indexing a map by primitives or autoboxed type. I try to use it sparingly though as you're paying log(n) on pretty much all map operations.

While we're here, another tidbit that's often overlooked in the Java collections: If you reallly care about iteration performance, your data is without nulls, your data is already ordered how you like it or you don't care about ordering, your qty items >= 10, and you don't need random access, then ArrayDeque is gonna be your horse because of how much better it co-locates its contents in memory and how much less overhead is required to maintain it during each operation compared to all the other List implementations, including ArrayList and LinkedList.


In regards to sibling replies, does it feel to anyone else like everything (except golang) is converging towards PHP's array() ?


Interestingly enough here it was kinda but kinda not the other way around: historically PHP used a closed-addressing hash map and threaded a doubly linked list through it to maintain the insertion order.

But the dict TFA talks about doesn't use a doubly linked list, or closed addressing, its ordering is a side-effect of its implementation but not originally a core goal (memory saving and iteration speed were). It'd probably been proposed by others before but it came to wider attention after Raymond Hettinger (a core dev) proposed it a few years back[0]. PHP actually released it first[1], closely followed by pypy[2]. CPython only got around to it some time later[3]

[0] https://mail.python.org/pipermail/python-dev/2012-December/1...

[1] https://nikic.github.io/2014/12/22/PHPs-new-hashtable-implem...

[2] https://morepypy.blogspot.com/2015/01/faster-more-memory-eff...

[3] https://docs.python.org/3/whatsnew/3.6.html?highlight=3.6#wh...


JavaScript Maps are iterated over in insertion order.


std::map in C++ stores keys in order. You have to use std::unordered_map to not get that behavior.


Yes, but that’s different kind of order. Python dicts order is insertion order, while std::map is key order.


People who bother to complain are those who actually care about your thing. People who do not care simply leave without ever telling you why. Your complainers are often your most dedicated and invested users.


Though in this specific case, my sense has been that your complainers are often your most dedicated and invested users of a version of the language that was officially retired 38 days ago.

Python is just bizarrely political. I suspect that it's now cursed for all time to have every future PEP become a battle in a proxy war over PEP 3000.


Sometimes they're just concern trolls though.


Especially with the (lack of) change to sets I'm interested to benchmark some regular things before and after the change.

    set(dict.keys())
    set(dict.values())
    thing1 = dict(...)
    thing2 = copy.deepcopy(thing1)
I feel like there could be some other testcases. I'm wholly in support of the change (regardless of benchmarks) but depending on the results I could see some arguments against.


1. sets don't use the new dict's implementation

2. IIRC the new dicts are no(t significantly) slower than the old dicts, however they use less memory, and iterate faster

The iteration order is actually a side-effect of implementation details, the original goals were a more compact representation and a faster iteration: https://mail.python.org/pipermail/python-dev/2012-December/1...


> 1. sets don't use the new dict's implementation

Yes, of course. That's why I'm wondering if turning dict keys into a set is slower now.


I was under the impression python dict ordering was deterministic, just not in an order recognisable by humans, i.e. ordered by hashes. Was this not the case?


From the docs: "CPython implementation detail: Keys and values are listed in an arbitrary order which is non-random, varies across Python implementations, and depends on the dictionary’s history of insertions and deletions."

So I would claim that equates to non-deterministic


I read that as: if two dicts are constructed in the same way and use the same python implementation then they'd be the same. If you change the way the dicts were generated or the underlying implementation then they're not the same.

To me, nondeterministic is when the input does not change but the output does. The docs essentially say to not rely on the order, not that the result is not reproducible.


CPython hashes are intentionally non-deterministic, so even the same exact Python install will produce different hash orderings on the same code, if you run it twice (although you can make it deterministic by using PYTHONHASHSEED).


I’d call undependable but maybe it’s the same thing in practice?


That's generally what people mean when they say "nondeterministic" in the context of computing. Yeah, in philosophy it generally means something like "the future is not completely determined by the past," but in computing it means something closer to "the programmer cannot reasonably determine the behavior and thus should not depend on specific behavior."


In computing it means a given set of inputs lead to a given set of outputs. It has nothing to do with how difficult it is for a programmer to reason about. Deterministic builds, deterministic tests, etc.


But what counts as "input" will vary based on who you ask and under what context.


> Was this not the case?

From the origin to Python 3.2, iteration order was arbitrary but deterministic (in CPython, it was not guaranteed to be determinstic in other implementations)

From Python 3.3 to Python 3.5, iteration order was non-determinstic (across runs, it was deterministic per-process instance)

In Python 3.6 it's unspecified but deterministic and follows insertion order (in CPython)

In Python 3.7, Python 3.6's behaviour was specified and documented


I don't have any criticism if the performance of this new dict stays the same as earlier and that's a huge if. Considering python3 is already a crawling language compared to peers like java and php, I think devs should focus more on that before handing out features.


> I don't have any criticism if the performance of this new dict stays the same as earlier and that's a huge if.

1. it's been there since 2016, the new dict was released as part of Python 3.6 (though the ordering was only an implementation detail until 3.7)

2. the new dict is much more compact (20~25%) and should have much faster iteration speed, those were actually the original goals, ordered iteration was a side-effect

3. the new dict should not benchmark significantly slower across the board, it wouldn't have been merged if it were a regression




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: