I'm excited to see more profiling tools for Python!
This sounds like it does peak memory, which is critical for batch jobs, since that's the bottleneck. Memory is fundamentally different than performance in that it's a limited resource, instead of cumulative cost; making any part of the program faster almost always helps speed up the program (at least a little, or at least reduces CPU load), but optimizing non-peak memory has no impact. You have to be able to identify the peak in order to reduce memory usage.
If you want peak memory profiling for Python that also runs on macOS, check out https://pythonspeed.com/fil/ (ARM support has some issues, but once I unpack my new Mac Mini I plan to fix it.)
Ways memray is better than Fil:
- Native callstacks.
- More kinds of reports, and ability to do custom post-processing of data.
- Much lower overhead (but not always, see reply).
And if you're running Python batch jobs, and want both peak memory and performance profiling in production, check out Sciagraph: https://pythonspeed.com/sciagraph/
(You can probably cobble together something like Sciagraph with py-spy + memray, but you won't e.g. get timeline reports designed with batch jobs in mind.)
One thing to note is that we support tracking forked/child processes as well!
>> - Much lower overhead, sounds like.
I actually think Fil has the potential to be faster in some situations because it seems that aggregates the flamegraph in memory. Memray needs to do a ton of I/O to disk holding a lock so if is under very heavy pressure it will be a bit slow.
Here are some non-very scientific tests running the "test_list" file from the CPython test suite:
* With pymalloc active (not a lot of heavy presure):
$ fil-profile run -m test test_list
...
Total duration: 278 ms
$ memray3.10 run -m test test_list
...
Total duration: 128 ms
* With pymalloc not active (heavy presure):
$ PYTHONMALLOC=malloc fil-profile run -m test test_list
...
Total duration: 278 ms
$ PYTHONMALLOC=malloc memray3.10 run -m test test_list
Total duration: 344 ms
So as you can see Fil is 20% faster than memray in this scenario. This means that Fil is doing a fantastic job! We spent a lot of time optimizing memray and the fact that Fil can beat it is a testament to Fil's quality :)
It does much more than that! It tracks every single allocation and dumps it to a file that can later be analysed in many ways. Currently our reporters report peak memory (and leaked memory at the end of the execution) but technically any other reporter can be used. For example, we plan to allow to generate flame-graphs at arbitrary points in the execution and much more!
BTW I am starting a Slack for devs working on profilers, would be great to have you all join, I'd love to hear more about the ELF patching technique (Fil uses LD_PRELOAD and macOS equivalent).
Nice! Fil looks like an awesome profiler and is fantastic that works in other platforms. I am super excited to see more cool features and https://pythonspeed.com/sciagraph/ looks fantastic :)
>> but optimizing non-peak memory has no impact. You have to be able to identify the peak in order to reduce memory usage.
In some sense this is only true if you're the end user of a platform, if you're trying to pack jobs onto machines then you actually do care about the utilization at any given time, since you can oversubscribe based on someone's max usage.
E.g., you can give everyone a limit based on their peak memory, but then bin pack based on their actual usage (and evict when you're wrong)
Want to chime in to say it's awesome that Bloomberg is releasing this product to the world. I don't think many financial organizations would even consider releasing anything as open source, even if it is as general purpose as this tool. My current one (HFT,MM) wouldn't go for it.
It also shows good engineering practices. I'd venture a guess that they are probably one of the better ones in the space.
Well, today is one day... :) You didn't hint at the reason for saying that, but please don't hesitate to apply if you're at all interested.[1]
It would make more sense to say "financial data organization", which then makes it a bit more obvious why we'd have a whole lot of employees (6,500+) in Engineering compared to a typical "financial organization" making money primarily by participating in the public/private markets.
> I don't think many financial organizations would even consider releasing anything as open source, even if it is as general purpose as this tool.
Not to discount how great it is that Bloomberg is contributing this to the community, but there are a lot of other interesting open source tools that have been open sourced by by finance firms.
Pandas, for one, was originally developed internally at the hedge fund AQR.
Tangential question: Does anybody have a good recommendation for a profiler that works well with massively async codebases?
My experience has been that the concurrent nature of coroutines can make it hard to reason about what's going on at a particular point in time. If you don't know how many things you're awaiting on at a specific moment (and what potential external stuff they may be interacting with), it's not exactly easy to identify memory usage of codepaths.
I like open source Perfetto UI (formerly Google Chrome about::/tracing)
Wrote my own timing trace json export for Python and C++, across threads and processes. The docs have some pointers to creating your own traces with the Tracing SDK.
I built https://github.com/kunalb/panopticon to export perfetto/chrome compatible traces and also draw arrows between async functions. (I think the arrows are only supported in about://tracing though).
I've just put in production an app that intermittently can't allocate enough memory. Is this the best tool to debug it? I've never had to debug memory problems in Python.
Datadog's profiler [0] might help. The python profiler has a heap profile type which shows (the call site which allocates) data using memory [1]. Feedback welcome!
> Heap Live Size
> Shows the amount of heap memory allocated by each function that has not been garbage collected (yet). This is useful for investigating the overall memory usage of your service and identifying potential memory leaks.
If this isn't due to latent memory pressure, it's usually because _something_ in your code is trying to allocate a gigantic amount of memory due to a bug. Think a 1 billion x 1 billion numpy array, or a billion element list.
Yes, I just give this tool a try and reduced our app initial memory allocation from 120MB to about 80MB in few minutes by moving imports around, ex: rarely called functions importing big 3rd party library.
You can use this to figure out what need to be optimized, but in your case optimization might not need to be easy, in above example you would have to re-implement the 3rd party library in memory efficient way or find an other one
Fil can dump memory profiling reports on failed allocations; as other commenter said, Linux by default happily just gives you memory even if doesn't have any, so failed allocation implies giant allocation.
Is this due to memory constraints on your system? If so, then this would definitely help you profile _what_ in your code is trying to claim so much memory.
First and foremost for debugging purposes, e.g. to find memory leaks and such.
Then, it's not a bad thing to optimize memory usage in any language. Memray could be a great tool for the authors of data science libraries, which Python has plenty of. Their average user probably doesn't have a beefy developer's laptop with 64GB of memory at hand.
debugging high memory usage in python can be annoying. python is used a lot for data analysis e.g. pandas, numpy. You might have scenarios where some dataframe either generated or read, is a lot bigger than others for whatever reason. This might be useful in debugging as to why that's the case.