Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's extremely convenient for interactive use. See this tutorial -- http://rethinkdb.com/docs/external-api-access/.

You can grab JSON data out of APIs in seconds, filter and manipulate it, and enrich it with more APIs. I've been using `r.http` for some time (since an internal beta) and I now find it invaluable for ad-hoc analysis.



> You can grab JSON data out of APIs in seconds, filter and manipulate it, and enrich it with more APIs.

So it's almost as nice as SQL, except bespoke and not specifically designed for interactive data exploration?


I think it's best to play with it to get a feel for how it works. For example, what would the experience be doing the analysis described in http://rethinkdb.com/docs/external-api-access in SQL?


Can we schedule r.http commands? In other words, can this feature be used as an ETL-lite for regularly pulling in data from external sources?


Not from within the database yet (you'd have to add a cron job for now to periodically connect and call `r.http`). It's a great idea though, I'll see if we can add a timer primitive!


No offense but that's starting to sound like scope creep. RethinkDB is a great project, and I've used it before, but these kinds of candy-coating features really aren't necessary.

Someone who wants to periodically pull in feeds in production code can easily write a short script in their language of choice to grab it and add it to the DB.

It's not the role of the database to be an HTTP client, cron job scheduler, etc. Can it set the Accept header? The User-Agent header? Can it support different HTTP methods? How does it handle retries, and streaming downloads? Does the timer have to be initiated by a script each time or will it remain persistent in the database server? How can debugging info like the last time it ran and the next time it'll run by displayed?

It's not worth building supporting for all of these myriad possible use cases when someone can accomplish this easily in their general purpose programming language.

I understand adding r.http as a basic prototyping utility, but I think it should be left as it is. A timer primitive has no use for prototyping or experimenting. Developer time should be spent on improving core functionality, driver support and performance, etc.


Thanks for saying this, and don't worry, RethinkDB will never degenerate into a hodgepodge of feature creep! We always follow a simple guideline; before we decide to add a feature we ask ourselves, is this going to make the product feel tighter? If the feature makes the product feel bigger, we probably won't add it. Things get added only if they fit really, really well.

We added `r.http` because it just fits magically well with the rest of ReQL on many many levels. JSON fits, streams fit, lazy evaluation fits, even batch prefetching fits! Everything works so wonderfully and provides such a great interactive experience, it almost would be odd not to add.

Now, we won't add timers just for the sake of adding timers. If we add them, we'll make sure they fit and make the product tighter. There are good reasons to consider a timer implementation in the server:

  - Many users have asked for "capped" tables for logging, which
    can be implemented as a combination of timers and secondary
    indexes (i.e. delete everything older than X on a timer).
  - There is already internal timer functionality that we use for
    various tasks. It makes sense to evaluate whether it's worth
    exposing to the user.
  - Before we add timer support, we'd have to add more robust job
    control (which many people have already asked for). If we do
    add job control, timers feel like they might fit very
    elegantly.
These are difficult questions, and we take these really seriously. If a feature doesn't fit magically well and feels elegant, almost as if it makes the product smaller, we don't add it. So don't worry about feature creep -- if RethinkDB does stray from a sound path, it almost certainly won't be because of scope creep!


I can understand and agree with the use cases of timers in those scenarios. I just think designing it for people who want to pull data over HTTP on an interval would be a bad idea.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: