Why is this faster than the stdlib? What does it do to achieve better performanc...

uniqueuid · on Aug 11, 2023

It's in the readme of the github project.

> In short, the main reasons why MPIRE is faster are:

    When fork is available we can make use of copy-on-write shared objects, which reduces the need to copy objects that need to be shared over child processes

    Workers can hold state over multiple tasks. Therefore you can choose to load a big file or send resources over only once per worker

    Automatic task chunking

niemandhier · on Aug 11, 2023

COW can come back and bite you by causing not easily predictable runtime.

Your code goes down a rarely used branch and suddenly a large object gets copied.

misnome · on Aug 11, 2023

Isn’t this given “for free” by the fact that it’s fork, even in standard multiprocessing? What does the library do extra?

Galanwe · on Aug 12, 2023

It doesn't do much extra I guess.

In standard multiprocessing, all arguments are pickled and pushed to a queue for processes in the pool to use.

To pass heavy arguments, the trick to using CoW was to place them as global variables before the map.

My understanding from Mpire is that they do the same thing, but expose a `shared_objects` parameter to make it less hacky than global variables.

I guess their benchmarks compare against pickling arguments, not against using global variables/CoW, which is why they boast performance increase.

indeedmug · on Aug 11, 2023

Yea, I am a struggling to figure out what the secret sauce of this library and if that sauce is introducing foot guns down the line.

Multiprocessing std uses fork in linux distros already. I once ran a multiprocess code on Linux and Windows and there was a significant improvement in performance when running Linux.

mufti_menk · on Aug 11, 2023

They're deprecating fork in 1 or 2 versions, one of the main issues with it is copies locks across processes which can cause deadlocks.