Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Why is this faster than the stdlib? What does it do to achieve better performance?


It's in the readme of the github project.

> In short, the main reasons why MPIRE is faster are:

    When fork is available we can make use of copy-on-write shared objects, which reduces the need to copy objects that need to be shared over child processes

    Workers can hold state over multiple tasks. Therefore you can choose to load a big file or send resources over only once per worker

    Automatic task chunking


COW can come back and bite you by causing not easily predictable runtime.

Your code goes down a rarely used branch and suddenly a large object gets copied.


Isn’t this given “for free” by the fact that it’s fork, even in standard multiprocessing? What does the library do extra?


It doesn't do much extra I guess.

In standard multiprocessing, all arguments are pickled and pushed to a queue for processes in the pool to use.

To pass heavy arguments, the trick to using CoW was to place them as global variables before the map.

My understanding from Mpire is that they do the same thing, but expose a `shared_objects` parameter to make it less hacky than global variables.

I guess their benchmarks compare against pickling arguments, not against using global variables/CoW, which is why they boast performance increase.


Yea, I am a struggling to figure out what the secret sauce of this library and if that sauce is introducing foot guns down the line.

Multiprocessing std uses fork in linux distros already. I once ran a multiprocess code on Linux and Windows and there was a significant improvement in performance when running Linux.


They're deprecating fork in 1 or 2 versions, one of the main issues with it is copies locks across processes which can cause deadlocks.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: