slava @ rethink here. I forgot to add `r.args` (http://rethinkdb.com/api/javascript/#args) to the blog post! People have been requesting that feature for a few releases now, and it should make lots of code much less painful.
FYI the upcoming 1.14 release is scheduled to include a distributed file system and geospacial indexing (probably the most requested RethinkDB features of all time).
If you have any questions about Rethink, I'm here all day to answer questions. (We can also grab lunch or coffee if you're in the Bay Area).
Mike @ RethinkDB here. We're co-hosting a RethinkDB + Firebase meetup on July 1st in San Francisco on building realtime apps. We'll be doing more in-depth demos of the new 1.13 features, and showing how you can use them to build realtime apps faster.
We'd love to see you come hang out with the RethinkDB and Firebase teams. Even better, give a lightning talk on how you're using RethinkDB! Christina is finalizing the schedule, if you'd like to do a lightning talk shoot her an email (christina@rethinkdb.com).
I must admit this makes me confused -- do RethinkDB and Firebase integrate in some non-obvious way? Both appear to be JSON databases with real-time change feeds so it would seem they'd be competitors if anything... Any chance you can clarify the relationship? Thanks.
The relationship between RethinkDB and Firebase is based on conceptual/philosophical agreement about where the world is going (realtime json sync), and that developer tools should be delightful. That's how we know each other, and decided to co-host the event. There is no software integration or anything.
I don't think the companies are really competitors -- Firebase is a service, Rethink is a product, Firebase has much more hands off ops and smaller query language, etc. So the use cases where you'd pick one or the other tend to be quite different.
We just want to give demos about building realtime apps, and get to know people who care about the subject!
Thanks, that makes sense. The point about service vs product is well taken, though I find myself wishing somebody supported both modalities: a service to get a prototype up and running quickly, with the option of moving to a self-hosted setup later if needed for scaling or security reasons.
We'd like that too, but it's easier said than done. A good, robust service is like a whole other company! We're looking at options, but there is still a dramatic philosophical difference between a product first company and a service first company.
I really hope I get to use Rethink in anger before I die. Last time, I had to move on because we are multi-tenant SaaS and it seemed the compound indexes just were not up to that task, does this release fix any of those issues? (I seem to recall I was having to use a between query and not being able to sort, but it was a while ago and I quickly got back to work with old reliable (postgresql))
I'll bump it up in the roadmap -- sorry it's taking so long to get this fixed. It turns out that everyone only needs 5% of RethinkDB features, but it's always a different 5%. That makes product roadmaps really hard. I think in this sense databases are a bit like word processors :)
That seems correct, I saw the issue bump on github, you guys seem like the nicest and brightest engineers around and really hope you guys keep that vibe.
Absolutely amazing release! A quarter back we had a small chat with @coffeemug, to see if RethinkDB would be a good fit for our product (geospatial indexing, ad-hoc queries, realtime updates, ...). He said then that it wasn't yet, and promised that it would be in a year's time. I see now what he means, Rethink is already looking like a very strong contender!
Damn, rethink won't stop getting better and better! I'm looking forward to getting to use the changefeeds, as I've currently got a redis list that I use to get notified of changes to rethinkdb, which is fine but just another cog in the system to debug.
Seriously guys, thanks for the great product and awesome database experience (PyRethinkORM author here).
Thanks for PyRethinkORM! I haven't used it yet, but really looking forward to taking it for a spin with Django in a few weeks. We'd love to feature it in our docs when we get through a few more immediate issues.
Awesome! Let me know how it goes and feel free to file as many bugs and issues as you can. I've been saying that I'm going to be releasing v1 soon but school and summer internships have been getting in the way of that, but you might want to take a look at the `dev` branch it is much better than the current v0.3 thats on PyPi, imo.
Built-in notifications on writes are really cool! This is something Meteor had to implement on the application layer because MongoDB never supported it natively (and the oplog consumed by Meteor didn't really have enough context). RethinkDB gives you both oldValue and newValue which is super cool!
Change feeds look great. r.http seems like a frippery, though. Not a great signal to see things like that getting added when there are much bigger fish yet to fry.
The r.http command was controversial, even internally. I had to call in a lot of favors to convince people to get it in, but here is why I was convinced it's a good idea:
- It fits! It just seems to work magically well with the rest of
ReQL on so many levels! JSON fits, streams fit, lazy evaluation
fits, even batch prefetching fits! Everything works so
wonderfully and provides such a great interactive experience,
it almost would be silly not to add!
- It makes a use case that's really important to me
personally (adhoc analytics) an order of magnitude easier. Over
the past few months I ran into a few other people that also do
a lot of interactive adhoc data analysis with Rethink, which
redoubled my resolve to add r.http.
- It makes example datasets *much* more elegant. Reading
RethinkDB docs? Just call this command to get the dataset in so
you can play around with a command. (We haven't updated the
examples yet, but we will). That makes the first experience
with the product better, and makes the learning experience much
more pleasant.
So I hear what you're saying, but I respectfully disagree. If you think of it as a signal, think of it as a signal of deep passion for the product and our users! Ornamentation for the sake of ornamentation isn't what we do.
(On an unrelated note, I really love your use of 'frippery' -- I'm a bit of a word nerd, and this is a really neat word!)
Thanks for the considered response. I think I have a much better handle now on why it seemed like a good idea to you and a bad idea to me.
The disconnect is that we see Rethink in different ways. You're looking at it, in this context, as Rethink + ReQL combining into a "tool for data analysis".
When I look at Rethink I see "Place to put my data".
My viewpoint pretty much ignores ReQL as anything more than "Means to get my data out of the thing". Which, thinking harder about it, is wrong. You guys are looking to make ReQL more than that. My viewpoint is too heavily influenced by the way I think about things like CouchDB.
I normally don't go in for value judgements about people's viewpoints, but I think in this case yours has to be declared objectively correct. Given that it's your product and I'm just a guy that's been an interested observer for a while, but has never actually used it in anger.
So, thanks for helping me get it. Word nerds unite!
I think the biggest difference is the scalability built-in. It has a beautiful Web admin and CLI admin that ease the scailability works including sharding and adding a node to the cluster.
PostgreSQL also has built-in streaming replication, which is useful and works nicely. But the scenario it covers is scaling reads, not writes. To scale writes, you should shard the data manually.
There are efforts to bring write scalability to PostgreSQL. One of them I'm keeping my eyes on recently is Postgres-XL, which uses statement-based replication that focus on writability and availability. It is a premature product that started recently, though. And also, some easy-to-administer features like automatic failover and configuration propagation (adding/removing nodes) are not the objective of the project. You should adopt another solution, or write it manually.
However, instead of getting the scalability, you lose ACID in RethinkDB - whose impact can be huge on some use cases. But personally I think it is worth losing to gain scalability easily. And there is two-phase commit which is at least doable to imitate transactions, though it isn't elegant at all.
If your application is read-intensive, I think staring with PostgreSQL and adopting Postgres-XL or pgpool later is a good strategy. But if you are to shard the data to scale writes, or you want an all-on-one solution that helps the administrative works a lot, I think RethinkDB is highly recommended.
Postgres is an amazing product, and incredibly stable by virtue of being very mature. You can't go wrong with it and will almost certainly be just fine.
However, you can't quite compare Postgres document support with RethinkDB. You'd really need to play with the product to understand the difference, but think of it this way. You could sort of treat Java as a dynamic language by casting everything to object, it would work just fine, but it is night and day compared to how it would work in Python. The difference between the experience of using Postgres and RethinkDB for JSON data is just as vast. You'd really how to try both before you can viscerally understand how amazing a dedicated JSON environment can be.
RethinkDB is behind Postgres on raw performance and we're still shaking out scalability quirks (see http://rethinkdb.com/stability/) but if you take the long view (a year or so), these will be worked out in on time. I'd encourage you to play with the environment -- it's easy and fun, and see if you like it. Feel free to shoot me an email (slava@rethinkdb.com) and I'll be happy to help you out if you have questions.
Great stuff as always. I have had a RethinkDB instance in production for 13 months now (since 1.4 - developed it while on 1.2) and looking forward to upgrading and playing with the new features in other projects.
Congratulations on the release guys. The update to the Javascript driver is huge and will make development a lot less painful. I get to remove a lot of redundant code thanks to that.
We debated this for a while and decided not to do channels for now.
Feeds are great because you can use them to integrate with other pieces of the infrastructure like RabbitMQ or ElasticSearch, or write reactive apps where clients instantaneously react to changes in other clients. Incidentally, you can use them to easily get pubsub, but it wasn't the original intention.
There are much better pub/sub services out there, so we decided to stick to having a really good feed API and avoid implementing channels for the time being.
Unless I'm mistaken RethinkDB changefeeds are more powerful than the couch API. You can perform filters on changefeeds, do joins, transformations, etc., while the Couch API gives you a feed that has to be fully processed on the client.
am i correct that the biggest hurdle to a python 3 driver has been the lack of python 3 support in Google's protocol buffer library? If so then, it seems that the 1.13 release which removes the protobuf dependency, will substantially accelerate development of the python3 driver
That's right! In fact, there is a pull request to make the driver Python 3 compatible that's being reviewed right now (plus some additional testing changes, etc.) We should be able to get Python 3 support in pretty soon.
You can grab JSON data out of APIs in seconds, filter and manipulate it, and enrich it with more APIs. I've been using `r.http` for some time (since an internal beta) and I now find it invaluable for ad-hoc analysis.
I think it's best to play with it to get a feel for how it works. For example, what would the experience be doing the analysis described in http://rethinkdb.com/docs/external-api-access in SQL?
Not from within the database yet (you'd have to add a cron job for now to periodically connect and call `r.http`). It's a great idea though, I'll see if we can add a timer primitive!
No offense but that's starting to sound like scope creep. RethinkDB is a great project, and I've used it before, but these kinds of candy-coating features really aren't necessary.
Someone who wants to periodically pull in feeds in production code can easily write a short script in their language of choice to grab it and add it to the DB.
It's not the role of the database to be an HTTP client, cron job scheduler, etc. Can it set the Accept header? The User-Agent header? Can it support different HTTP methods? How does it handle retries, and streaming downloads? Does the timer have to be initiated by a script each time or will it remain persistent in the database server? How can debugging info like the last time it ran and the next time it'll run by displayed?
It's not worth building supporting for all of these myriad possible use cases when someone can accomplish this easily in their general purpose programming language.
I understand adding r.http as a basic prototyping utility, but I think it should be left as it is. A timer primitive has no use for prototyping or experimenting. Developer time should be spent on improving core functionality, driver support and performance, etc.
Thanks for saying this, and don't worry, RethinkDB will never degenerate into a hodgepodge of feature creep! We always follow a simple guideline; before we decide to add a feature we ask ourselves, is this going to make the product feel tighter? If the feature makes the product feel bigger, we probably won't add it. Things get added only if they fit really, really well.
We added `r.http` because it just fits magically well with the rest of ReQL on many many levels. JSON fits, streams fit, lazy evaluation fits, even batch prefetching fits! Everything works so wonderfully and provides such a great interactive experience, it almost would be odd not to add.
Now, we won't add timers just for the sake of adding timers. If we add them, we'll make sure they fit and make the product tighter. There are good reasons to consider a timer implementation in the server:
- Many users have asked for "capped" tables for logging, which
can be implemented as a combination of timers and secondary
indexes (i.e. delete everything older than X on a timer).
- There is already internal timer functionality that we use for
various tasks. It makes sense to evaluate whether it's worth
exposing to the user.
- Before we add timer support, we'd have to add more robust job
control (which many people have already asked for). If we do
add job control, timers feel like they might fit very
elegantly.
These are difficult questions, and we take these really seriously. If a feature doesn't fit magically well and feels elegant, almost as if it makes the product smaller, we don't add it. So don't worry about feature creep -- if RethinkDB does stray from a sound path, it almost certainly won't be because of scope creep!
I can understand and agree with the use cases of timers in those scenarios. I just think designing it for people who want to pull data over HTTP on an interval would be a bad idea.
Michael can be an acquired taste, like fine wine or good scotch. If you don't push past that first sip, you'll never know the deep, tantalizing world known to us connoisseurs as @mlucy. (But if you do, please let it be a matter of public record that we discovered him first)
Thanks, apologies for my comment being offensive... but I should be able to state objectively what I thought detracted from the message of an otherwise fantastic announcement. I in no way mean to pass any judgement on Michael as a person...
Still, his voice seems better than many other software developers. But, I know what you mean: I've had problems listening to some UK developer videos. (I'm US.)
On closer inspection of the video: I'm jealous of any company that lets their employees walk around the office with their shoes off. Not only are they super-smart, they're cool too.
(Sorry for this shallow, superficial observation. I'm still learning about databases and programming.)
`r.args` isn't very flashy but is a huge step forward for ReQL.
Removing protobuf makes deploying to certain platforms (e.g. heroku) so much easier.
Promises support for the Javascript driver brings it to the modern age.
And of course changefeeds and `r.http`!
If you've been hearing about RethinkDB and thinking about trying it out, now's a great time to spin the wheels.