Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Well I once had an analytics/statistics tool that regularly chewed through a couple GBs of CSV files. After enough features had been added it took almost 5 minutes per run which got really annoying.

It took me less than an hour to add multiprocessing to analyze each file in its own process and merge the results together at the end. The runtime dropped to a couple seconds on my 24 thread machine.

It really was much easier than expected. Rewriting it in C++ would have probably taken a week.



In F#, it would just be

    let results = files |> Array.Parralel.map processFile
Literally that easy.

Earlier this week, I used a ProcessPoolExecutor to run some things in their own process. I needed a bare minimum of synchronization, so I needed a queue. Well, multiprocessing has its own queue. But that queue is not joinable. So I chose the multiprocessing JoinableQueue. Well, it turns out that that queue can't be used across processes. For that, you need to get a queue from the launching process' manager. That Queue is the regular Python queue.

It is a gigantic mess. And yes, asyncio also has its own queue class. So in Python, you literally have a half a dozen or so queue classes that are all incompatible, have different interfaces, and have different limitations that are rarely documented.

That's just one highlight of the mess between threading, asyncio, and multiprocessing.


Well I'm not here to debate the API cleanliness, I just wanted to point out to OP that Python can utilize multicore processors without threads ;)

Here is the part of multiprocessing I used:

  with Pool() as p:
      results = p.map(calc_func, file_paths)
So, pretty easy too IMO.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: