RethinkDB 1.13: pull data via HTTP, push data via changefeeds

sync · on June 17, 2014

Huge release this time around. Congrats guys & gals!

`r.args` isn't very flashy but is a huge step forward for ReQL.

Removing protobuf makes deploying to certain platforms (e.g. heroku) so much easier.

Promises support for the Javascript driver brings it to the modern age.

And of course changefeeds and `r.http`!

If you've been hearing about RethinkDB and thinking about trying it out, now's a great time to spin the wheels.

coffeemug · on June 17, 2014

slava @ rethink here. I forgot to add `r.args` (http://rethinkdb.com/api/javascript/#args) to the blog post! People have been requesting that feature for a few releases now, and it should make lots of code much less painful.

FYI the upcoming 1.14 release is scheduled to include a distributed file system and geospacial indexing (probably the most requested RethinkDB features of all time).

If you have any questions about Rethink, I'm here all day to answer questions. (We can also grab lunch or coffee if you're in the Bay Area).

mglukhovsky · on June 17, 2014

Mike @ RethinkDB here. We're co-hosting a RethinkDB + Firebase meetup on July 1st in San Francisco on building realtime apps. We'll be doing more in-depth demos of the new 1.13 features, and showing how you can use them to build realtime apps faster.

RSVP to the meetup here: http://www.meetup.com/RethinkDB-Bay-Area-Meetup-Group/events...

We'd love to see you come hang out with the RethinkDB and Firebase teams. Even better, give a lightning talk on how you're using RethinkDB! Christina is finalizing the schedule, if you'd like to do a lightning talk shoot her an email (christina@rethinkdb.com).

piotrkaminski · on June 18, 2014

I must admit this makes me confused -- do RethinkDB and Firebase integrate in some non-obvious way? Both appear to be JSON databases with real-time change feeds so it would seem they'd be competitors if anything... Any chance you can clarify the relationship? Thanks.

coffeemug · on June 18, 2014

Slava @ Rethink here.

The relationship between RethinkDB and Firebase is based on conceptual/philosophical agreement about where the world is going (realtime json sync), and that developer tools should be delightful. That's how we know each other, and decided to co-host the event. There is no software integration or anything.

I don't think the companies are really competitors -- Firebase is a service, Rethink is a product, Firebase has much more hands off ops and smaller query language, etc. So the use cases where you'd pick one or the other tend to be quite different.

We just want to give demos about building realtime apps, and get to know people who care about the subject!

piotrkaminski · on June 19, 2014

Thanks, that makes sense. The point about service vs product is well taken, though I find myself wishing somebody supported both modalities: a service to get a prototype up and running quickly, with the option of moving to a self-hosted setup later if needed for scaling or security reasons.

coffeemug · on June 19, 2014

We'd like that too, but it's easier said than done. A good, robust service is like a whole other company! We're looking at options, but there is still a dramatic philosophical difference between a product first company and a service first company.

troyk · on June 17, 2014

I really hope I get to use Rethink in anger before I die. Last time, I had to move on because we are multi-tenant SaaS and it seemed the compound indexes just were not up to that task, does this release fix any of those issues? (I seem to recall I was having to use a between query and not being able to sort, but it was a while ago and I quickly got back to work with old reliable (postgresql))

coffeemug · on June 17, 2014

I think you're talking about https://github.com/rethinkdb/rethinkdb/issues/1227. It hasn't been fixed yet, and is a big deal for many multitenancy users.

I'll bump it up in the roadmap -- sorry it's taking so long to get this fixed. It turns out that everyone only needs 5% of RethinkDB features, but it's always a different 5%. That makes product roadmaps really hard. I think in this sense databases are a bit like word processors :)

troyk · on June 17, 2014

That seems correct, I saw the issue bump on github, you guys seem like the nicest and brightest engineers around and really hope you guys keep that vibe.

simonb · on June 17, 2014

another vote for compound indices.

coffeemug · on June 17, 2014

Just FYI, you compound indexes are already fully supported -- see http://rethinkdb.com/docs/secondary-indexes/python/.

The particular issue in question is using an index to filter documents, and then ordering them by another index.

dkhenry · on June 17, 2014

It is tough to keep up with all these changes. I mean Rethink was pretty feature complete for me like four or five releases ago.

I don't know how I feel about the http command but I am digging the other changes. Keep up the good work.

neumino · on June 17, 2014

@dkhenry -- Thanks for all your work on the Java driver!

vishy1618 · on June 18, 2014

Absolutely amazing release! A quarter back we had a small chat with @coffeemug, to see if RethinkDB would be a good fit for our product (geospatial indexing, ad-hoc queries, realtime updates, ...). He said then that it wasn't yet, and promised that it would be in a year's time. I see now what he means, Rethink is already looking like a very strong contender!

transientbug · on June 17, 2014

Damn, rethink won't stop getting better and better! I'm looking forward to getting to use the changefeeds, as I've currently got a redis list that I use to get notified of changes to rethinkdb, which is fine but just another cog in the system to debug.

Seriously guys, thanks for the great product and awesome database experience (PyRethinkORM author here).

coffeemug · on June 17, 2014

Thanks for PyRethinkORM! I haven't used it yet, but really looking forward to taking it for a spin with Django in a few weeks. We'd love to feature it in our docs when we get through a few more immediate issues.

transientbug · on June 17, 2014

Awesome! Let me know how it goes and feel free to file as many bugs and issues as you can. I've been saying that I'm going to be releasing v1 soon but school and summer internships have been getting in the way of that, but you might want to take a look at the `dev` branch it is much better than the current v0.3 thats on PyPi, imo.

imslavko · on June 17, 2014

Built-in notifications on writes are really cool! This is something Meteor had to implement on the application layer because MongoDB never supported it natively (and the oplog consumed by Meteor didn't really have enough context). RethinkDB gives you both oldValue and newValue which is super cool!

coffeemug · on June 17, 2014

You can also query on them, which is really cool IMO! E.g. give me every document where the score has increased:

  r.table('games')  \
   .changes()       \
   .filter(r.row['old_val']['score'] < r.row['new_val']['score']) \
   .run(conn)

Or, give me every document for user X:

  r.table('games')  \
   .changes()       \
   .filter(r.row['new_val']['user_id'] == X) \
   .run(conn)

So you can write really cool apps out of the box without having to filter things on the client.

(BTW, we have a lot more planned for this)

davidbanham · on June 18, 2014

Change feeds look great. r.http seems like a frippery, though. Not a great signal to see things like that getting added when there are much bigger fish yet to fry.

coffeemug · on June 18, 2014

The r.http command was controversial, even internally. I had to call in a lot of favors to convince people to get it in, but here is why I was convinced it's a good idea:

  - It fits! It just seems to work magically well with the rest of
    ReQL on so many levels! JSON fits, streams fit, lazy evaluation
    fits, even batch prefetching fits! Everything works so
    wonderfully and provides such a great interactive experience,
    it almost would be silly not to add!
  - It makes a use case that's really important to me
    personally (adhoc analytics) an order of magnitude easier. Over
    the past few months I ran into a few other people that also do
    a lot of interactive adhoc data analysis with Rethink, which
    redoubled my resolve to add r.http.
  - It makes example datasets *much* more elegant. Reading
    RethinkDB docs? Just call this command to get the dataset in so
    you can play around with a command. (We haven't updated the
    examples yet, but we will). That makes the first experience
    with the product better, and makes the learning experience much
    more pleasant.

So I hear what you're saying, but I respectfully disagree. If you think of it as a signal, think of it as a signal of deep passion for the product and our users! Ornamentation for the sake of ornamentation isn't what we do.

(On an unrelated note, I really love your use of 'frippery' -- I'm a bit of a word nerd, and this is a really neat word!)

davidbanham · on June 18, 2014

Thanks for the considered response. I think I have a much better handle now on why it seemed like a good idea to you and a bad idea to me.

The disconnect is that we see Rethink in different ways. You're looking at it, in this context, as Rethink + ReQL combining into a "tool for data analysis".

When I look at Rethink I see "Place to put my data".

My viewpoint pretty much ignores ReQL as anything more than "Means to get my data out of the thing". Which, thinking harder about it, is wrong. You guys are looking to make ReQL more than that. My viewpoint is too heavily influenced by the way I think about things like CouchDB.

I normally don't go in for value judgements about people's viewpoints, but I think in this case yours has to be declared objectively correct. Given that it's your product and I'm just a guy that's been an interested observer for a while, but has never actually used it in anger.

So, thanks for helping me get it. Word nerds unite!

hrjet · on June 18, 2014

I am not a database expert hence this question.

RethinkDB looks better than MongoDB in many respects. But how does it compare to Postgresql (latest versions with document support)?

If I were to start on a new project with modest scalability requirements, what should I choose and why?

barosl · on June 18, 2014

I think the biggest difference is the scalability built-in. It has a beautiful Web admin and CLI admin that ease the scailability works including sharding and adding a node to the cluster.

PostgreSQL also has built-in streaming replication, which is useful and works nicely. But the scenario it covers is scaling reads, not writes. To scale writes, you should shard the data manually.

There are efforts to bring write scalability to PostgreSQL. One of them I'm keeping my eyes on recently is Postgres-XL, which uses statement-based replication that focus on writability and availability. It is a premature product that started recently, though. And also, some easy-to-administer features like automatic failover and configuration propagation (adding/removing nodes) are not the objective of the project. You should adopt another solution, or write it manually.

However, instead of getting the scalability, you lose ACID in RethinkDB - whose impact can be huge on some use cases. But personally I think it is worth losing to gain scalability easily. And there is two-phase commit which is at least doable to imitate transactions, though it isn't elegant at all.

If your application is read-intensive, I think staring with PostgreSQL and adopting Postgres-XL or pgpool later is a good strategy. But if you are to shard the data to scale writes, or you want an all-on-one solution that helps the administrative works a lot, I think RethinkDB is highly recommended.

coffeemug · on June 18, 2014

Hi - Slava @ Rethink here.

Postgres is an amazing product, and incredibly stable by virtue of being very mature. You can't go wrong with it and will almost certainly be just fine.

However, you can't quite compare Postgres document support with RethinkDB. You'd really need to play with the product to understand the difference, but think of it this way. You could sort of treat Java as a dynamic language by casting everything to object, it would work just fine, but it is night and day compared to how it would work in Python. The difference between the experience of using Postgres and RethinkDB for JSON data is just as vast. You'd really how to try both before you can viscerally understand how amazing a dedicated JSON environment can be.

RethinkDB is behind Postgres on raw performance and we're still shaking out scalability quirks (see http://rethinkdb.com/stability/) but if you take the long view (a year or so), these will be worked out in on time. I'd encourage you to play with the environment -- it's easy and fun, and see if you like it. Feel free to shoot me an email (slava@rethinkdb.com) and I'll be happy to help you out if you have questions.

orkj · on June 17, 2014

Great stuff as always. I have had a RethinkDB instance in production for 13 months now (since 1.4 - developed it while on 1.2) and looking forward to upgrading and playing with the new features in other projects.

Thanks again for your awesome work, guys!

kclay · on June 17, 2014

Great release, time to get my Scala driver updated. Who wants to help getting the `changes` api to work with itertees?

dkhenry · on June 17, 2014

Yeah I think that change is going to be another fun one to implement.

Seich · on June 17, 2014

Congratulations on the release guys. The update to the Javascript driver is huge and will make development a lot less painful. I get to remove a lot of redundant code thanks to that.

I am a huge fan, keep up the outstanding work!

maxpert · on June 17, 2014

Great job! Totally loved it. Thought it would be a good idea to have channels instead of tables for pub/sub mechanism.

coffeemug · on June 17, 2014

We debated this for a while and decided not to do channels for now.

Feeds are great because you can use them to integrate with other pieces of the infrastructure like RabbitMQ or ElasticSearch, or write reactive apps where clients instantaneously react to changes in other clients. Incidentally, you can use them to easily get pubsub, but it wasn't the original intention.

There are much better pub/sub services out there, so we decided to stick to having a really good feed API and avoid implementing channels for the time being.

selvakn · on June 18, 2014

Is the changefeeds feature comparable with changes api of couchdb?

coffeemug · on June 18, 2014

Unless I'm mistaken RethinkDB changefeeds are more powerful than the couch API. You can perform filters on changefeeds, do joins, transformations, etc., while the Couch API gives you a feed that has to be fully processed on the client.

doug1001 · on June 19, 2014

am i correct that the biggest hurdle to a python 3 driver has been the lack of python 3 support in Google's protocol buffer library? If so then, it seems that the 1.13 release which removes the protobuf dependency, will substantially accelerate development of the python3 driver

coffeemug · on June 19, 2014

That's right! In fact, there is a pull request to make the driver Python 3 compatible that's being reviewed right now (plus some additional testing changes, etc.) We should be able to get Python 3 support in pretty soon.

Goranek · on June 17, 2014

Is using http out of the db really useful? Can someone give me a simple case when this should be used?

coffeemug · on June 17, 2014

It's extremely convenient for interactive use. See this tutorial -- http://rethinkdb.com/docs/external-api-access/.

You can grab JSON data out of APIs in seconds, filter and manipulate it, and enrich it with more APIs. I've been using `r.http` for some time (since an internal beta) and I now find it invaluable for ad-hoc analysis.

quotemstr · on June 17, 2014

> You can grab JSON data out of APIs in seconds, filter and manipulate it, and enrich it with more APIs.

So it's almost as nice as SQL, except bespoke and not specifically designed for interactive data exploration?

coffeemug · on June 17, 2014

I think it's best to play with it to get a feel for how it works. For example, what would the experience be doing the analysis described in http://rethinkdb.com/docs/external-api-access in SQL?

hsshah · on June 18, 2014

Can we schedule r.http commands? In other words, can this feature be used as an ETL-lite for regularly pulling in data from external sources?

coffeemug · on June 18, 2014

Not from within the database yet (you'd have to add a cron job for now to periodically connect and call `r.http`). It's a great idea though, I'll see if we can add a timer primitive!

meowface · on June 18, 2014

No offense but that's starting to sound like scope creep. RethinkDB is a great project, and I've used it before, but these kinds of candy-coating features really aren't necessary.

Someone who wants to periodically pull in feeds in production code can easily write a short script in their language of choice to grab it and add it to the DB.

It's not the role of the database to be an HTTP client, cron job scheduler, etc. Can it set the Accept header? The User-Agent header? Can it support different HTTP methods? How does it handle retries, and streaming downloads? Does the timer have to be initiated by a script each time or will it remain persistent in the database server? How can debugging info like the last time it ran and the next time it'll run by displayed?

It's not worth building supporting for all of these myriad possible use cases when someone can accomplish this easily in their general purpose programming language.

I understand adding r.http as a basic prototyping utility, but I think it should be left as it is. A timer primitive has no use for prototyping or experimenting. Developer time should be spent on improving core functionality, driver support and performance, etc.

coffeemug · on June 18, 2014

Thanks for saying this, and don't worry, RethinkDB will never degenerate into a hodgepodge of feature creep! We always follow a simple guideline; before we decide to add a feature we ask ourselves, is this going to make the product feel tighter? If the feature makes the product feel bigger, we probably won't add it. Things get added only if they fit really, really well.

We added `r.http` because it just fits magically well with the rest of ReQL on many many levels. JSON fits, streams fit, lazy evaluation fits, even batch prefetching fits! Everything works so wonderfully and provides such a great interactive experience, it almost would be odd not to add.

Now, we won't add timers just for the sake of adding timers. If we add them, we'll make sure they fit and make the product tighter. There are good reasons to consider a timer implementation in the server:

  - Many users have asked for "capped" tables for logging, which
    can be implemented as a combination of timers and secondary
    indexes (i.e. delete everything older than X on a timer).
  - There is already internal timer functionality that we use for
    various tasks. It makes sense to evaluate whether it's worth
    exposing to the user.
  - Before we add timer support, we'd have to add more robust job
    control (which many people have already asked for). If we do
    add job control, timers feel like they might fit very
    elegantly.

These are difficult questions, and we take these really seriously. If a feature doesn't fit magically well and feels elegant, almost as if it makes the product smaller, we don't add it. So don't worry about feature creep -- if RethinkDB does stray from a sound path, it almost certainly won't be because of scope creep!

meowface · on June 18, 2014

I can understand and agree with the use cases of timers in those scenarios. I just think designing it for people who want to pull data over HTTP on an interval would be a bad idea.

miralabs · on June 18, 2014

anyone using rethinkdb in production? Hows the experience like?

samstave · on June 17, 2014

No offense, but I found the voice of the engineer difficult to listen to seriously, found the video too humorous.

coffeemug · on June 17, 2014

Michael can be an acquired taste, like fine wine or good scotch. If you don't push past that first sip, you'll never know the deep, tantalizing world known to us connoisseurs as @mlucy. (But if you do, please let it be a matter of public record that we discovered him first)

samstave · on June 17, 2014

Thanks, apologies for my comment being offensive... but I should be able to state objectively what I thought detracted from the message of an otherwise fantastic announcement. I in no way mean to pass any judgement on Michael as a person...

da02 · on June 17, 2014

Still, his voice seems better than many other software developers. But, I know what you mean: I've had problems listening to some UK developer videos. (I'm US.)

On closer inspection of the video: I'm jealous of any company that lets their employees walk around the office with their shoes off. Not only are they super-smart, they're cool too.

(Sorry for this shallow, superficial observation. I'm still learning about databases and programming.)

jeff_5nines · on June 17, 2014

What, are you two years old? I understood him perfectly. Why even make a comment like that anyway?

samstave · on June 17, 2014

Look, I found it highly distracting. I didn't say I couldn't understand him. I am free to state what I found distracting from that video.