Memray: a memory profiler for Python

shcheklein · on April 20, 2022

A twitter thread with some screenshots and details - https://twitter.com/1st1/status/1516859294896906241

itamarst · on April 20, 2022

I'm excited to see more profiling tools for Python!

This sounds like it does peak memory, which is critical for batch jobs, since that's the bottleneck. Memory is fundamentally different than performance in that it's a limited resource, instead of cumulative cost; making any part of the program faster almost always helps speed up the program (at least a little, or at least reduces CPU load), but optimizing non-peak memory has no impact. You have to be able to identify the peak in order to reduce memory usage.

If you want peak memory profiling for Python that also runs on macOS, check out https://pythonspeed.com/fil/ (ARM support has some issues, but once I unpack my new Mac Mini I plan to fix it.)

Ways memray is better than Fil:

- Native callstacks.

- More kinds of reports, and ability to do custom post-processing of data.

- Much lower overhead (but not always, see reply).

- Subprocess support.

Fil I suspect has better flamegraphs: https://pythonspeed.com/articles/a-better-flamegraph/

And if you're running Python batch jobs, and want both peak memory and performance profiling in production, check out Sciagraph: https://pythonspeed.com/sciagraph/

(You can probably cobble together something like Sciagraph with py-spy + memray, but you won't e.g. get timeline reports designed with batch jobs in mind.)

pablogsal · on April 20, 2022

One thing to note is that we support tracking forked/child processes as well!

>> - Much lower overhead, sounds like.

I actually think Fil has the potential to be faster in some situations because it seems that aggregates the flamegraph in memory. Memray needs to do a ton of I/O to disk holding a lock so if is under very heavy pressure it will be a bit slow.

Here are some non-very scientific tests running the "test_list" file from the CPython test suite:

* With pymalloc active (not a lot of heavy presure):

$ fil-profile run -m test test_list ... Total duration: 278 ms

$ memray3.10 run -m test test_list ... Total duration: 128 ms

* With pymalloc not active (heavy presure):

$ PYTHONMALLOC=malloc fil-profile run -m test test_list ... Total duration: 278 ms

$ PYTHONMALLOC=malloc memray3.10 run -m test test_list

Total duration: 344 ms

So as you can see Fil is 20% faster than memray in this scenario. This means that Fil is doing a fantastic job! We spent a lot of time optimizing memray and the fact that Fil can beat it is a testament to Fil's quality :)

pablogsal · on April 20, 2022

It does much more than that! It tracks every single allocation and dumps it to a file that can later be analysed in many ways. Currently our reporters report peak memory (and leaked memory at the end of the execution) but technically any other reporter can be used. For example, we plan to allow to generate flame-graphs at arbitrary points in the execution and much more!

itamarst · on April 20, 2022

BTW I am starting a Slack for devs working on profilers, would be great to have you all join, I'd love to hear more about the ELF patching technique (Fil uses LD_PRELOAD and macOS equivalent).

pablogsal · on April 20, 2022

Yeah, please hook me in! You know my email :)

itamarst · on April 20, 2022

I'm on vacation, will do Monday.

itamarst · on April 20, 2022

Cool! Fil also tracks every allocation too, although it doesn't dump that at the moment, just the resulting report.

pablogsal · on April 20, 2022

Nice! Fil looks like an awesome profiler and is fantastic that works in other platforms. I am super excited to see more cool features and https://pythonspeed.com/sciagraph/ looks fantastic :)

foota · on April 21, 2022

>> but optimizing non-peak memory has no impact. You have to be able to identify the peak in order to reduce memory usage.

In some sense this is only true if you're the end user of a platform, if you're trying to pack jobs onto machines then you actually do care about the utilization at any given time, since you can oversubscribe based on someone's max usage.

E.g., you can give everyone a limit based on their peak memory, but then bin pack based on their actual usage (and evict when you're wrong)

pid-1 · on April 21, 2022

Unrelated, but thank you for your awesome writings!

noitpmeder · on April 20, 2022

Want to chime in to say it's awesome that Bloomberg is releasing this product to the world. I don't think many financial organizations would even consider releasing anything as open source, even if it is as general purpose as this tool. My current one (HFT,MM) wouldn't go for it.

It also shows good engineering practices. I'd venture a guess that they are probably one of the better ones in the space.

Makes me want to work there! Maybe one day...

apaprocki · on April 21, 2022

Well, today is one day... :) You didn't hint at the reason for saying that, but please don't hesitate to apply if you're at all interested.[1]

It would make more sense to say "financial data organization", which then makes it a bit more obvious why we'd have a whole lot of employees (6,500+) in Engineering compared to a typical "financial organization" making money primarily by participating in the public/private markets.

[1]: https://www.bloomberg.com/company/what-we-do/

pjmlp · on April 21, 2022

They also release a couple other stuff, https://github.com/bloomberg

sterlinm · on April 23, 2022

> I don't think many financial organizations would even consider releasing anything as open source, even if it is as general purpose as this tool.

Not to discount how great it is that Bloomberg is contributing this to the community, but there are a lot of other interesting open source tools that have been open sourced by by finance firms.

Pandas, for one, was originally developed internally at the hedge fund AQR.

sterlinm · on April 23, 2022

Some other interesting projects from financial services firms:

https://perspective.finos.org/ (JP Morgan)

https://github.com/jpmorganchase/regular-table (JP Morgan)

https://github.com/man-group/dtale (Man Numeric)

https://github.com/deshaw/pyflyby (DE Shaw)

uniqueuid · on April 20, 2022

Tangential question: Does anybody have a good recommendation for a profiler that works well with massively async codebases?

My experience has been that the concurrent nature of coroutines can make it hard to reason about what's going on at a particular point in time. If you don't know how many things you're awaiting on at a specific moment (and what potential external stuff they may be interacting with), it's not exactly easy to identify memory usage of codepaths.

mjsir911 · on April 20, 2022

I've had good luck with yappi for debugging asynchronous code, not really focused on memory usage but walltime & such.

https://pypi.org/project/yappi/

uniqueuid · on April 20, 2022

Thanks, that looks useful!

kungfufrog · on April 20, 2022

https://github.com/plasma-umass/scalene check out scalene!

Jarwain · on April 21, 2022

+1 for scalene! I love how it breaks down consumption line-by-line, and it's worked very well in my async/multithreaded usecase

erwincoumans · on April 20, 2022

I like open source Perfetto UI (formerly Google Chrome about::/tracing) Wrote my own timing trace json export for Python and C++, across threads and processes. The docs have some pointers to creating your own traces with the Tracing SDK.

See https://ui.perfetto.dev

knlb2022 · on April 20, 2022

I built https://github.com/kunalb/panopticon to export perfetto/chrome compatible traces and also draw arrows between async functions. (I think the arrows are only supported in about://tracing though).

sterlinm · on April 23, 2022

Looks like you can use magic-trace with a perfetto based UI. https://github.com/janestreet/magic-trace

ogrisel · on April 20, 2022

viztracer can create traces for python programs. Not sure about async-io awareness though.

ogrisel · on April 20, 2022

Actually it has explicit support for async task based reporting:

https://github.com/gaogaotiantian/viztracer#async-support

zackangelo · on April 20, 2022

Do you mean specifically for Python? There's this for async Rust:

https://tokio.rs/blog/2021-12-announcing-tokio-console

p403n1x87 · on April 21, 2022

This should work with anything https://github.com/P403n1x87/austin

polskibus · on April 21, 2022

Can I plug it into existing airflow instance and check all the dags somehow? If not , what would be the easiest way to profile each dag?

neves · on April 20, 2022

I've just put in production an app that intermittently can't allocate enough memory. Is this the best tool to debug it? I've never had to debug memory problems in Python.

dbenamy · on April 21, 2022

Datadog's profiler [0] might help. The python profiler has a heap profile type which shows (the call site which allocates) data using memory [1]. Feedback welcome!

Disclosure: I work on this :-)

[0] https://app.datadoghq.com/profiling

[1] From https://docs.datadoghq.com/tracing/profiler/search_profiles/

    > Heap Live Size
    > Shows the amount of heap memory allocated by each function that has not been garbage collected (yet). This is useful for investigating the overall memory usage of your service and identifying potential memory leaks.

orf · on April 20, 2022

If this isn't due to latent memory pressure, it's usually because _something_ in your code is trying to allocate a gigantic amount of memory due to a bug. Think a 1 billion x 1 billion numpy array, or a billion element list.

vdfs · on April 21, 2022

Yes, I just give this tool a try and reduced our app initial memory allocation from 120MB to about 80MB in few minutes by moving imports around, ex: rarely called functions importing big 3rd party library.

You can use this to figure out what need to be optimized, but in your case optimization might not need to be easy, in above example you would have to re-implement the 3rd party library in memory efficient way or find an other one

itamarst · on April 20, 2022

Fil can dump memory profiling reports on failed allocations; as other commenter said, Linux by default happily just gives you memory even if doesn't have any, so failed allocation implies giant allocation.

https://pythonspeed.com/fil/docs/oom.html

noitpmeder · on April 21, 2022

Is this due to memory constraints on your system? If so, then this would definitely help you profile _what_ in your code is trying to claim so much memory.

anonu · on April 21, 2022

Interesting that this comes from Bloomberg.

frays · on April 21, 2022

Came here to say this.

p403n1x87 · on April 21, 2022

Hmm some elements of that TUI look vaguely familiar ... where have I seen them before?

user1713952 · on April 21, 2022

Really impressive feature set! I just wish it could run on windows too.

sterlinm · on April 21, 2022

Fil doesn't run on Windows either. Are there any good memory profilers that work on Windows?

ewuhic · on April 22, 2022

Could anyone suggest a similar quality CPU load python profiler?

kristianpaul · on April 21, 2022

This is awesome. Is there is something like this for Ruby?

isaacfrond · on April 21, 2022

If you need this, why are you using Python? Seriously, can anyone explain that to me?

Chiron1991 · on April 21, 2022

First and foremost for debugging purposes, e.g. to find memory leaks and such.

Then, it's not a bad thing to optimize memory usage in any language. Memray could be a great tool for the authors of data science libraries, which Python has plenty of. Their average user probably doesn't have a beefy developer's laptop with 64GB of memory at hand.

fer · on April 21, 2022

Because Python might be the best tool for a job that has memory constraints?

anonu · on April 21, 2022

What's your underlying point here? That people shouldn't be doing things in python that take up a lot of memory?

ajoseps · on April 21, 2022

debugging high memory usage in python can be annoying. python is used a lot for data analysis e.g. pandas, numpy. You might have scenarios where some dataframe either generated or read, is a lot bigger than others for whatever reason. This might be useful in debugging as to why that's the case.

dmalecaut · on April 21, 2022

Substantial python codebase already?