From one of the linked threads: [1] > I might be good to remind the readers that...

Znafon · on Aug 18, 2023

Yes this is a thing was trivially protected by the GIL. There is the same thing with mutating the same map concurrently in Go that will panic for example.

PEP 703 goes over this in the "Container Thread-Safety" (I think container here refers to the fact that the object has references to other objects, this is the things that already are special-cased in CPython to be managed specifically by the garbage collector):

> This PEP proposes using per-object locks to provide many of the same protections that the GIL provides. For example, every list, dictionary, and set will have an associated lightweight lock. All operations that modify the object must hold the object’s lock. Most operations that read from the object should acquire the object’s lock as well; the few read operations that can proceed without holding a lock are described below.

More information at https://peps.python.org/pep-0703/#container-thread-safety

miohtama · on Aug 18, 2023

Note that concurrency of containers not specific to Python any way.

For example, Java implements different versions of containers for single thread and multithread usage, because multithreaded containers have obvious performance penalty

https://docs.oracle.com/javase/8/docs/api/java/util/concurre...

nayuki · on Aug 18, 2023

But also wrappers like: https://docs.oracle.com/javase/8/docs/api/java/util/Collecti...

dwaite · on Aug 20, 2023

> For example, Java implements different versions of containers for single thread and multithread usage, because multithreaded containers have obvious performance penalty.

Very few codebases in Java are single-threaded; specialty frameworks like Netty are the exception not the rule.

Likewise, there are not different containers for single threaded and multithreaded usage; there are containers that have different strategies for dealing with multi-threaded usage.

Hashtable is the oldest and is notable for it still being present and still being fundamentally flawed. It will lock reads and writes, but does not describe a way to lock for a transaction - e.g. changes based on read data. As such, it fundamentally has race conditions you can't protect against.

Hashmap and the rest of the Java 1.2 collections API set a slightly better pattern - they don't internally try to maintain safety, but provide mechanisms like synchronizedMap() to let you hold the monitor for the length of your transaction.

However, this could only be so good, because the monitors in Java are pretty fundamentally broken as well. A monitor is both part of its public API ( e.g. "synchronized(foo) {...}" ) and part of its implementation (e.g. public synchronized void bar() { ... })". This means that external code can affect your internal operation if you leverage the monitor that you get by default through your "this" instance.

As such, synchronized set involves three monitors:

1. The monitor on the interface-implementing collection type itself, e.g. on the HashMap. This likely is never used.

2. The monitor on the object returned by the 'synchronizedXXX' wrapping method. This is used to protect transactional access, such as iterating through while removing items.

3. The monitor used as a mutex inside the object returned by the 'syncronizedXXX' wrapping method, protect the integrity of the collection data type if used by multithreaded code which does not hold monitor #2. The code may have a race condition, but it won't put the collection itself into an inconsistent structural state.

The 'synchronizedXXX'-returned wrapper objects are pretty expensive, and if you can you should just internalize those collections into business object that does any needed syncronization itself.

ConcurrentHashMap and the like are lockless, and are built with the idea that you can perform the changes needed through atomic operations rather than transactions. This isn't always true, but often is.

For a collection which is always held by a single thread, the atomic operation overhead may still cause a performance impact - after all, the atomic operations are still processor state synchronization points. It is also possible to beat ConcurrentHashMap with regular HashMap on certain usage in multithreaded environments, when you are properly protecting access to the HashMap yourself.

It might be challenging to find scenarios where ConcurrentHashMap doesn't beat the 'synchronizedMap()' wrapper, just because the implementation itself is really expensive.

FartyMcFarter · on Aug 18, 2023

Thank you! That makes sense, and it also explains why removing the GIL has a negative performance impact as discussed in other comments. Taking a lock every time a container is accessed is significant overhead, which is why languages like C++ don't make basic containers thread-safe.

ska · on Aug 18, 2023

Effectively the GIL is incurring that overhead on every data structure whether you need it or not.

But with it removed, you'll have to think about it in your designs more than currently. History shows us this is not easy.

FartyMcFarter · on Aug 18, 2023

> Effectively the GIL is incurring that overhead on every data structure whether you need it or not.

Not really. The GIL is taken and released quite infrequently (only when the Python interpreter decides it's time to do a context switch), whilst the new locks for each data structure are taken/released every time you do a basic operation on those data structures.

Holding a lock that is rarely taken/released incurs very little overhead.

ska · on Aug 18, 2023

Hm, that’s a fair point, it’s only some of the same overhead because it’s not per operation.

blibble · on Aug 18, 2023

indeed

not sure there's much they can do about this, other than protecting all the built-in data structure operations with mutexes, like java's original data structures (Hashtable, Vector, etc)

(but then how do you get a non-synchronized [] if you want one?)

OskarS · on Aug 18, 2023

My impression was that that was exactly what they were going to do: replace the GIL with fine-grained locking on the objects themselves. I can't imagine they'd let multiple python threads manipulate python data-structures concurrently, the interpreter would segfault immediately.

> (but then how do you get a non-synchronized [] if you want one?)

You don't. This is one of the reasons why using the GIL is higher performance for single-threaded use-cases: stuff like lists and dicts can be non-synchronized

dwaite · on Aug 20, 2023

> not sure there's much they can do about this, other than protecting all the built-in data structure operations with mutexes, like java's original data structures (Hashtable, Vector, etc)

However - this is fundamentally the incorrect approach, because Vector and Hashtable aren't protected from read-then-write race conditions.

Such internal locking guarantees that the collection stays structurally sound, but not that code accessing it is dealing with a single consistent state until it finishes.

Znafon · on Aug 18, 2023

I think you can make a non thread-safe list by just making the same object without the lock described at https://peps.python.org/pep-0703/#container-thread-safety if you really want the maximum performance of single threads.

Maybe it could be part of a non_threadsafe_containers module on Pypi.

insanitybit · on Aug 18, 2023

You could fast path it by checking if the reference count of the list is `1` and avoid taking the lock in that case, I think.

smaddox · on Aug 18, 2023

Could you? What enforces only a single thread having access to a given reference? What about global variables?

insanitybit · on Aug 19, 2023

If there's a refcount of 1 you can mutate the value safely because no other thread could be trying to read/ write to it. And the only thread that can give it to others would be the one that's doing that mutation, so it can't suddenly change.

I'd assume that importing any sort of module level variable would imply an increment of the counter, but unsure.

blibble · on Aug 18, 2023

that would work on x86 but not on an arch that can re-order loads (e.g. arm)

insanitybit · on Aug 18, 2023

I'm assuming it would be a fenced operation

blibble · on Aug 18, 2023

but then there's still an advantage to using the fenceless version :)

(admittedly it's python so it's so slow it's probably not even measurable)

insanitybit · on Aug 18, 2023

Yeah, for Python I feel like the difference between fenced vs unfenced doesn't matter. The primary cost is around your L3 cache getting slammed with contentious atomics but your L3 is already absolutely fucked if you're using Python.

josefx · on Aug 18, 2023

List append would probably end up guarded by an instance specific lock. At least it is in some of the nogil concept code.

jerpint · on Aug 18, 2023

Wouldn’t a naive solution just be to have a flag that by default enables the GIL, and when disabled, a warning gets printed?

kortex · on Aug 18, 2023

Nope. Cause users would just ignore or silence the warning, continue on, get subtly incorrect, difficult to reproduce behavior, submit tickets/issues, and just cause a lot of dead weight overall.

It's not remotely a trivial problem, and I assure you just about every naive solution has been considered and rejected.