I fail to understand the use of python in a distributed environment while the language has such poor concurrency support (on top of the lack of a type system). You can make your application HA, but they are obviously not trying to squeeze out every CPU cycle.
> I fail to understand the use of python in a distributed environment while the language has such poor concurrency support (on top of the lack of a type system). You can make your application HA, but they are obviously not trying to squeeze out every CPU cycle.
You're conflating several things that are orthogonal imo. A system can be distributed without concurrency. A concurrent system need not be distributed. Either kind of thing can be built with or without a specific kind of type system. And CPU efficiency has nothing to do specifically with any of the previous things.
To expand on that: distributed systems are quite often constructed from simple single-threaded processes, and where concurrency is needed it is probably more often achieved through multi-processing than multi-threading. A single-threaded event-dispatched request-response service probably describes a big chunk of all the stuff running in distributed environments today. In a lot of these cases the workloads are i/o bound, and instructions per cycle is not even close to the top of the list of concerns. There are a lot of reasons why python fits into that world very well.
> A system can be distributed without concurrency. A concurrent system need not be distributed.
The only justification for such design I can think of is HA. I.e. you have very light workload that does not saturate a single machine, but nonetheless you create a distributed system with two machines for HA. Such light workload is probably not often the case at Netflix.
> The only justification for such design I can think of is HA.
I'm not trying to justify any particular design. I'm just saying that concurrency and distributed processing are different things. And I'm no expert, but I continue to be confused as to what saturation, whether of cpu or i/o - you don't specify - has to do with concurrency? If I have for example n http servers all running one thread and handling enough traffic that they are all at 100 percent CPU that is by some minimal definition a distributed system, but none of the processes in it are concurrent. If the http servers read and write a database then the database is almost certainly concurrent, so then you have a distributed system that has both concurrent and non-concurrent processes collaborating.
> f I have for example n http servers all running one thread and handling enough traffic that they are all at 100 percent CPU that is by some minimal definition a distributed system, but none of the processes in it are concurrent.
Concurrency is broader than just threads in a process. In your example, the whole problem is concurrent.
Yes of course you're correct. Any system of that kind is concurrent across its architecture. But the current context being considered is that of concurrency as a programming paradigm, which was established by the OP's observations on python's lack of concurrency which sparked the exchange. In the comment you quoted above the statement "none of the processes are concurrent" was meant to make the context clearer.
So if you need concurrency in the context of a single thread, then Python's GIL is a non-starter. But a distributed environment is not likely one of those.
Edit: I should amend concurrency in a single thread to: concurrency in a single thread that is compute gated...since coroutines can give you pseudo concurrency in a single thread provided you're workload has blocking steps like IO or TCP calls.
In fact it's precisely Python's deficiency in multithreading that lead to it having one of the best ecosystems for every other form of concurrency, like green threads and multiprocess applications.
If you’re doing (data analysis|simulations|Image processing) you can offload computation to numpy, which releases the GIL. This allows nice multicore speedups with python and threading.
The same holds for various CPU intensive standard library functions implemented in C.
The GIL issue is real, but posts like this one confused me for years. Please, don’t exaggerate GIL issues.
Not everything is amenable to numpy and it's pretty easy to make performance worse by throwing numpy at every problem. For example, if your array contains Python objects that are part of an operation, you've likely just introduced a significant performance regression. Worse, there's no way to detect these regressions except to have performance tests. Please, don't understate GIL issues.
Ah. Pandas is the problem. Unfortunately, you need to understand how its features are implemented to use it well. Still, my main trouble with Pandas is unnecessary memory bloat, not compute inefficiency.
That caveat is somewhat true for all programming abstractions, but well-designed interfaces make the more efficient techniques more obvious and beautiful, while the inefficient or risky techniques are made esoteric and ugly.
Pandas isn't the problem, the problem is assuming that "$LIBRARY releases the GIL so things will be fast!". It's a pennywise, pound-foolish approach to performance. Someone will write a function assuming the user is only going to pass in a list of ints and someone else will extend that function to take a list of Tuple[str, int] or something and all of a sudden your program has a difficult to debug performance regression.
In general, the "just rewrite the slow parts in C!" motto is terrible advice because it's unlikely that it will actually make your code appreciably slower, and if it does, it's very likely to be defeated unexpectedly as soon as requirements change. Using FFI to make things faster can work, but only if you've really considered your problem and you're quite sure you can safely predict relevant changes to requirements.
> Someone will write a function assuming the user is only going to pass in a list of ints and someone else will extend that function to take a list of Tuple[str, int]
There are plenty of pitfalls in leaky abstractions. Establishing that fast numeric calculations only work with specific numeric types seems to help.
One thing you seem to be encountering, that I've seen a few times, is that people don't realize NumPy and core Python are almost orthoganal. The best practices for each are nearly opposite. I try to make it clear when I'm switching from one to the other by explaining the performance optimization (broadly) in comments.
Regardless, any function that receives a ``list`` of ints will need to convert to an ndarray if it wants NumPy speed. If the function interface is modified later, I think it's fair to expect the editor to understand why.
> There are plenty of pitfalls in leaky abstractions
Sure, but this is a _massive_ pitfall. It's an optimization that can trivially make your code slower than the naive Python implementation, all due to a leaky abstraction.
> Regardless, any function that receives a ``list`` of ints will need to convert to an ndarray if it wants NumPy speed. If the function interface is modified later, I think it's fair to expect the editor to understand why.
Yeah, that was a toy example. In practice, the scenario was similar except the function called to a third party library that used Numpy under the hood. We introduced a ton of complexity to use this third party library on the grounds that "it will make things fast" instead of the naive list implementation, and the very next sprint we needed to update it such that it became 10X slower than the naive Python implementation.
That's the starkest example, but there have been others and there would have been many more if we didn't have the stark example to point to.
The current slogan is "just use Python; you can always make things fast with Numpy/native code!", but it should be "use Python if you have a deep understanding of how Numpy (or whatever native library you're using) makes things fast such that you can count on that invariant to hold even as your requirements change" or some such.
I have mixed feelings about your conclusion. On one hand I don't want to discourage newbies from using Python. On the other, I enjoy that my expertise is valuable.
It seems reasonable that different parts of the code are appropriate for modification by engineers of differing skills.
Even if your app is IO bound, Python's concurrency is painful. Because it's not statically typed, it's too easy to forget an `await` (causing your program to get a Promise[Foo] when you meant to get a Foo) or to overburden your event loop and such things are difficult to debug (we've had several production outages because of these class of bugs). Never mind the papercuts that come about from dealing with the sync/async dichotomy.
Both problems have built-in debug solutions in recent versions of python. The event loop will literally print out all the un-awaited coroutines when it exits, and you can enable debug on the event loop and have it print out every time a coroutine takes longer than a configurable amount of time.
> The event loop will literally print out all the un-awaited coroutines when it exits
IIRC, I've only ever seen "unawaited coroutine found" (or similar) errors; I've never seen anything that points to a specific unawaited coroutine. In either case, a bug in prod is still many times worse than compile time type error.
> you can enable debug on the event loop and have it print out every time a coroutine takes longer than a configurable amount of time
I don't run my production servers in debug mode, and even when I do manage to find the problem, I have limited options for solving it. Usually it amounts to refactoring out the offending code into a separate process or service.
An extreme counterpoint is a language like Go which
1) Is roughly 100X faster in single-threaded, CPU-bound execution anyway
2) Allows for additional optimizations that simply aren't possible in Python (mostly involving reduced allocations and improved cache coherence)
3) Has a runtime that balances CPU load across all available cores
This isn't a "shit on Python" post; only that concurrency really isn't Python's strong suit (yet).
It's not at all obvious that "they" refers to "netflix and other post-production oriented users", and your argument is a tautology "Python is good at the things that Python is good at". Obviously. The rest of us are debating what those things are or are not.
The subject is well-trodden, there's not much to debate. Python is not good at threading, but works well in multiprocessing situations. Netflix is using it in the later situation, and not the former. Async is unlikely to be a use case either.
The missing await is a a very common fault indeed, they should have used another keyword like 'not_await' for that scenario to make the decision explicit. Pycharm at least will warn you if you call an awaitable without 'await' and without assigning it to a variable. If you assign it to a variable and pass it into another function that doesn't expect an awaitable, it's up to you to have added sufficient type annotations and run your code through some static checker like mypy. Running python at scale without mypy is kindof doomed to failed to begin with.
Just because you don't see or understand doesn't mean their usage is null and void. Does distributed imply a need to "squeeze out every cycle" at the cost of productivity? If they find that they are CPU bound, or need more efficiency then it's likely they'll move to another platform. If not, why make the move, or lose productivity?
Personally, I've worked with some HPC apps in pharma. We found quite a mix of CPU and IO-bound challenges when we actually profiled and looked closely at what was slowing up the apps. Contrary to the original belief, rewriting and increasing improving the CPUs wouldn't have helped much.
Some parts of the process are obviously not ones where it makes sense to squeeze out every CPU cycle. Sometimes programmer time (both in development and maintenance) is your scarce resource (either because of total numbers, or people with specific knowledge), and so it is far better to optimize for that.
In most cases (Python, Ruby, etc...) is it even possible to find your hot-loops and replace them with C/C++/Rust/etc... code. So you can really focus on those small areas where that would make an actual difference.
Additionally, in some places where you need concurrency you can split up the task into parts that have little to do with each other, and there Python is a great glue language to manage calling other executable that do the actual work.
To make a distributed application, you need some sort of orchestration. For example, a Redis job queue.
And at that point, you can just code your application as a single thread, and run one per core on the machine.
And the Python lack of types "issue" has always been a bit overblown in my opinion. If it's not immediately obvious what something is, then you need to either comment more, or come up with better variable names. And if you really, really feel the need for a type system, Python natively supports that now.
> I fail to understand the use of python in a distributed environment while the language has such poor concurrency support (on top of the lack of a type system).
Well, without knowing the exact issue people trying to solve, of course you won't understand the motivation behind.
Reading their post, it seems that they are not using Python for serving, mainly for long running daemon processes. Concurrency is probably not something people care about in such situation, nor squeeze CPU perf. In fact such workload probably wants to maintain a low-key finger print.
Re: Type system -- at Gigantum we very aggressively enforce all classes and methods in the core libraries must be fully typed using mypy. The depth and expressiveness of mypy rivals that of any other strongly typed language.