Why hasn't PyPy been merged into the Python trunk?

axiak · on Oct 13, 2012

To me, what makes CPython special is that it's a language interpreter that's incredibly easy to get into and hack due to its simplicity. Pypy, while probably a good standard for server use, will never have that quality to it. IMHO it would be a shame if python loses this aspect of the ecosystem.

omaranto · on Oct 13, 2012

I haven't looked at the source for either but shouldn't PyPy's interpreter be easier to mess around with because it's written in Python (with some hints for the JIT)? Probably the thing that turns the interpreter into a JIT compiler is more sophisticated, but if you want to modify Python you wouldn't be looking at that part.

dbaupp · on Oct 13, 2012

Being a little pedantic: PyPy is written[1] in RPython[2], which is a strict subset of Python, and is quite restricted in many ways (E.g. when I was experimenting with it a while ago, all file operations, including reading/writing, were done with file descriptors (NB. it was a while ago, so this might've changed)).

So it's not as quite as easy to fiddle with as you might expect, but still significantly easier than CPython, I guess.

[1]: http://doc.pypy.org/en/latest/architecture.html#pypy-the-pyt... [2]: http://doc.pypy.org/en/latest/faq.html#what-is-this-rpython-...

axiak · on Oct 13, 2012

Watch this for a great presentation discussing this: http://www.youtube.com/watch?v=l_HBRhcgeuQ

Not that they haven't gotten better. My point is that CPython has a different set of goals which makes it great for a reference implementation.

winter_blue · on Oct 13, 2012

I might be sounding a bit futuristic, but I think reference implementations should be auto-generated from a "spec" lang.

I don't know if a language for doing something like that even exists. Does anyone know of something like this?

EDIT: I should clarify: a spec should be _high-level_, i.e. abstract away all the unimportant details. Perhaps unit tests would be better in this case. Perl 6 follows this model.

But it would be even better if one could some how "fill in the details" of the specs separately, from the spec, rather than mesh the whole thing into one giant C puddle.

fox91 · on Oct 13, 2012

Jokin? Have you ever tried the PyPy interpreter? It's almost the same as the CPython one! And gives you a nice quotation at every startup :P

jeremyjh · on Oct 13, 2012

I understood his comment to refer to hacking on the interpreter itself (e.g. the source of CPython). Not talking about using the REPL.

axiak · on Oct 13, 2012

zzzeek · on Oct 13, 2012

Pypy uses a crapton more memory.

fijal · on Oct 13, 2012

Hi mike, the answer to this really depends. you should qualify such blank statements a bit more. To be precise:

* startup memory is much higher (30M vs 5M roughly). If you have lots of very small processes, PyPy is not a good fit.

* objects are smaller (by a bit unspecified amount, up to 50%)

* there is a GC overhead which means peak memory will be ~30% of your total heap

* JIT occupies some memory. This is a function of the size of your code.

zzzeek · on Oct 15, 2012

You're right, I reran the tests that I've been seeing, and it's probably CPU that is where I'm having problems with Pypy. I have a test suite that completes in about 17 minutes for cPython and with Pypy I can't even get it to complete within two hours.

This is running the SQLAlchemy unit tests against SQLite, on an Amazon EC2 small instance via our jenkins suite at http://jenkins.sqlalchemy.org. So yes, we are dealing with more limited resources than usual. Usually when something slows to a crawl on EC2 it's because it started swapping, so I had assumed that was the issue here, but apparently it's not. SQLAlchemy is a large library with a lot of tests - 155 test_* modules. So I'd imagine pypy has lots of work to do running the JIT on all those source files, and I guess because running tests means a continuous stream of new codepaths, that means all new JIT activity for each one.

In this particular case, the two tests ran on the same server, and resource contention seems likely. Swap space remained 100% free; the two jobs shared the CPU 50/50 and once the cPython job was done, pypy's went right out to 99% and stayed there. For startup time, the cPython suite started running tests within 3 seconds, and pypy didn't get to the test suite for about one minute 40 seconds. Pypy didn't actually start running real tests, save for a series of "skipped" tests in the beginning, until the cPython job was totally finished at 17 minutes. Pypy then took all 99% of the CPU for the rest of it's duration, and about an hour into it, it's just about halfway through the suite.

I'd welcome any help in debugging why the test suite here appears to be excessively slow (is it the slow sqlite module?) Otherwise, if this is just how things are with the JIT + large number of codepaths, that would be a significant caveat to pypy's speed advantage. But you're right, it wasn't memory.

update: the build on pypy took a total of 2 hours 17 minutes.

fijal · on Oct 16, 2012

There are quite a few problems with pypy and test suites. This sounds like it's an extreme case, but sqlite is definitely very slow. How about you post this on a bugtracker, so we have a point of reference to start with?

For the record, I don't believe "this is how things with the JIT are" to start with. 17 minutes is by far enough to spin the JIT. It might be sqlite, it might be some code in sqlalchemy, it might be something unbeliavably silly, please start with a bug report and we can take it from there. SQLAlchemy is an important package and would be cool to have it run fast on PyPy.

malkia · on Oct 13, 2012

Last I've looked at it, it took quite a long time to compile compared to CPython (while say luajit is not taking that much longer to compile compared to reference lua).

octopus · on Oct 13, 2012

The stable PyPy version is provided as a binary. In principle you will need to compile this from sources only if you have special needs.

Have a look here (binaries for Mac, Linux and Windows):

http://pypy.org/download.html#default-with-a-jit-compiler

DasIch · on Oct 13, 2012

You don't have to compile PyPy to run the tests and in general you wouldn't compile PyPy or run the entire test suite, you just run the tests relevant for the part you worked on and let the buildbot take care of everything else.

stefantalpalaru · on Oct 13, 2012

PyPy doesn't support Python's C API. It's more of an exercise in tracing JIT compilation (and crowdfunding various experiments) rather than a CPython replacement.

fox91 · on Oct 13, 2012

Not true. It supports CPython C modules since 2010 http://morepypy.blogspot.it/2010/04/using-cpython-extension-... The problem is CPyExt is not so fast and works only if PyPy implements the required functions used by the C module

DasIch · on Oct 13, 2012

A lot of C modules rely on CPython specifics, PyPy cannot support, performance of the PyPy C API is horrible which is a huge problem given that C extensions are mainly used to improve performance and embedding PyPy isn't possible either.

CPyExt is more of a hack you use until you have ctypes/cffi bindings.

stefantalpalaru · on Oct 13, 2012

It's incomplete and there's no interest in finishing it.

octopus · on Oct 13, 2012

You are right about the Pyhton's C API. However I've found that I can use any C (or C++ 2003) library through the cppyy branch.