To me, what makes CPython special is that it's a language interpreter that's incredibly easy to get into and hack due to its simplicity. Pypy, while probably a good standard for server use, will never have that quality to it. IMHO it would be a shame if python loses this aspect of the ecosystem.
I haven't looked at the source for either but shouldn't PyPy's interpreter be easier to mess around with because it's written in Python (with some hints for the JIT)? Probably the thing that turns the interpreter into a JIT compiler is more sophisticated, but if you want to modify Python you wouldn't be looking at that part.
Being a little pedantic: PyPy is written[1] in RPython[2], which is a strict subset of Python, and is quite restricted in many ways (E.g. when I was experimenting with it a while ago, all file operations, including reading/writing, were done with file descriptors (NB. it was a while ago, so this might've changed)).
So it's not as quite as easy to fiddle with as you might expect, but still significantly easier than CPython, I guess.
I might be sounding a bit futuristic, but I think reference implementations should be auto-generated from a "spec" lang.
I don't know if a language for doing something like that even exists. Does anyone know of something like this?
EDIT: I should clarify: a spec should be _high-level_, i.e. abstract away all the unimportant details. Perhaps unit tests would be better in this case. Perl 6 follows this model.
But it would be even better if one could some how "fill in the details" of the specs separately, from the spec, rather than mesh the whole thing into one giant C puddle.
You're right, I reran the tests that I've been seeing, and it's probably CPU that is where I'm having problems with Pypy. I have a test suite that completes in about 17 minutes for cPython and with Pypy I can't even get it to complete within two hours.
This is running the SQLAlchemy unit tests against SQLite, on an Amazon EC2 small instance via our jenkins suite at http://jenkins.sqlalchemy.org. So yes, we are dealing with more limited resources than usual. Usually when something slows to a crawl on EC2 it's because it started swapping, so I had assumed that was the issue here, but apparently it's not. SQLAlchemy is a large library with a lot of tests - 155 test_* modules. So I'd imagine pypy has lots of work to do running the JIT on all those source files, and I guess because running tests means a continuous stream of new codepaths, that means all new JIT activity for each one.
In this particular case, the two tests ran on the same server, and resource contention seems likely. Swap space remained 100% free; the two jobs shared the CPU 50/50 and once the cPython job was done, pypy's went right out to 99% and stayed there. For startup time, the cPython suite started running tests within 3 seconds, and pypy didn't get to the test suite for about one minute 40 seconds. Pypy didn't actually start running real tests, save for a series of "skipped" tests in the beginning, until the cPython job was totally finished at 17 minutes. Pypy then took all 99% of the CPU for the rest of it's duration, and about an hour into it, it's just about halfway through the suite.
I'd welcome any help in debugging why the test suite here appears to be excessively slow (is it the slow sqlite module?) Otherwise, if this is just how things are with the JIT + large number of codepaths, that would be a significant caveat to pypy's speed advantage. But you're right, it wasn't memory.
update: the build on pypy took a total of 2 hours 17 minutes.
There are quite a few problems with pypy and test suites. This sounds like it's an extreme case, but sqlite is definitely very slow. How about you post this on a bugtracker, so we have a point of reference to start with?
For the record, I don't believe "this is how things with the JIT are" to start with. 17 minutes is by far enough to spin the JIT. It might be sqlite, it might be some code in sqlalchemy, it might be something unbeliavably silly, please start with a bug report and we can take it from there. SQLAlchemy is an important package and would be cool to have it run fast on PyPy.
Last I've looked at it, it took quite a long time to compile compared to CPython (while say luajit is not taking that much longer to compile compared to reference lua).
You don't have to compile PyPy to run the tests and in general you wouldn't compile PyPy or run the entire test suite, you just run the tests relevant for the part you worked on and let the buildbot take care of everything else.
PyPy doesn't support Python's C API. It's more of an exercise in tracing JIT compilation (and crowdfunding various experiments) rather than a CPython replacement.
A lot of C modules rely on CPython specifics, PyPy cannot support, performance of the PyPy C API is horrible which is a huge problem given that C extensions are mainly used to improve performance and embedding PyPy isn't possible either.
CPyExt is more of a hack you use until you have ctypes/cffi bindings.