Mentat: A persistent, relational store inspired by Datomic and DataScript

pbowyer · on Feb 4, 2017

Key quotes:

"DataScript asks the question: "What if creating a database would be as cheap as creating a Hashmap?"

Mentat is not interested in that. Instead, it's strongly interested in persistence and performance, with very little interest in immutable databases/databases as values or throwaway use."

and:

"Datomic has a beautiful conceptual model. [...] Many of these design decisions are inapplicable to deployed desktop software; indeed, the use of multiple JVM processes makes Datomic's use in a small desktop app, or a mobile device, prohibitive.

Mentat is designed for embedding, initially in an Electron app (Tofino). It is less concerned with exposing consistent database states outside transaction boundaries, because that's less important here, and dropping some of these requirements allows us to leverage SQLite itself."

ah- · on Feb 4, 2017

Can you elaborate a bit on what this actually means?

How would I use it? How does a query look like?

holygoat · on Feb 5, 2017

I'm the author of most of those docs, if you'd like answers to any specific questions from the horse's mouth, so to speak. I'll cycle back around and reply to this comment again in a bit.

holygoat · on Feb 5, 2017

You might be interested in reading this much longer piece, which includes a brief code example, and spends more time than the README explaining the motivation for the project.

https://medium.com/project-tofino/introducing-datomish-a-fle...

pbowyer · on Feb 4, 2017

> Can you elaborate a bit on what this actually means?

Sure - I'm highlighting the differences from DataScript/Datomic. My take is the 'inspiration' from each (especially DataScript) is quite loose.

holygoat · on Feb 5, 2017

Yes. Project Mentat fits into a conceptual lineage that includes Freebase's graphd and 2005-onward Semantic Web stores. We have aimed for compatibility with Datomic and DataScript for least surprise, but if you squint there's a little AllegroGraph in the direction.

Datomic's model (both architectural and conceptual) draws from Clojure's concepts of persistence. That model isn't free, so Mentat deviates from it where it makes sense to do so: at present we don't implement querying of history or past states, for example, and when we do it won't be free.

We'll get closer to full Datomic-style datom store capabilities over time, but we'll make different performance tradeoffs.

pbowyer · on Feb 5, 2017

Thanks for the explanation. For me the key takeaway is:

> at present we don't implement querying of history or past states, for example, and when we do it won't be free.

That's the bit of Datomic that intrigues me and gives the best use-case (I don't have to add data versioning and history in my app-layer) and what I'm looking for in other systems.

holygoat · on Feb 5, 2017

Note that we do store the full transaction log, just like Datomic, and Mentat will allow querying of it (and replication, and replay, and…). We haven't implemented history querying yet because we haven't needed it for application code.

The trick with Datomic is that every time you grab a `db` instance, it's a snapshot, and the system's index chunking and storage replication are necessarily built around the ability to continue using those older index chunks, potentially for a very long time.

Most consumers, most of the time, just want to query the store as it stands at that moment, but Datomic peers pay the space and time penalty of keeping and retrieving historical index chunks in order to answer those historical queries.

My current thoughts are:

1. To allow for short-term snapshot querying through something like `db.keep()`, implemented via a SQLite read transaction. That's not free: the database WAL will continue to grow until the read transaction is ended, so it isn't ideal for all workloads, but it'll do.

For some queries it's enough to simply track a last-seen tx value and filter everywhere, but that becomes difficult when cardinality-one and unique-identity properties are considered.

2. The obvious equivalent to Datomic's 'with' is an uncommitted write transaction. Naturally this blocks other writers while it exists, and so alternative implementations (e.g., writing to a complete disk copy of the database, or writing a 'delta' table) might make sense.

At some very hazy point in the future we might try to get SQLite support for this: after all, if we can guarantee that a write transaction won't be committed, we could use a separate WAL file for the `with` and avoid blocking other writers.

3. A longer-term approach to snapshots/DB-as-value is to materialize the datoms at the specified instant in time, either in a temporary table or in a real persisted table. That is: `db.keep_forever()` will give you a new structure to query, and calling code will be responsible for cleaning up that space.

The reason I say "won't be free" is that each of these operations imposes a cost when the feature is used: either SQLite or Mentat will have to do some work to allow an extended period of isolation, to reconstruct some state, or to persist some state.

That's in contrast to Datomic, which imposes some overhead every time index chunks are built or retrieved. It's also an interesting parallel to Clojure vs Rust: Clojure's data structures are persistent by default, giving you snapshots and safety at a cost everyone pays; Rust believes that you shouldn't pay for abstractions you don't use.

rads · on Feb 4, 2017

When I read the title I was wondering if this was a new project. It's actually a continuation of the Datomish project. I'm glad they changed the name, though.

rektide · on Feb 4, 2017

Can't find any information whatsoever on usage. Not even sure if this meant to be externally accessed or whether this is purely for embedding (in other Rust code?).

steveklabnik · on Feb 4, 2017

> To start the server use:

>

> cargo run serve

So, not embedding.

https://github.com/mozilla/mentat/blob/rust/tests/external_t... looks like some sample usage.

SomeCallMeTim · on Feb 4, 2017

"Mentat is designed for embedding, initially in an Electron app (Tofino)."

It looks like it works both ways.

steveklabnik · on Feb 4, 2017

Neat!

holygoat · on Feb 5, 2017

It's intended for embedding. When complete Mentat will likely sport a Node module, a network protocol + CLI + explorer, and might well be accessible from Firefox extensions via a JavaScript API.

We are several months from that, though!

RegularOpossum · on Feb 4, 2017

Hehe, Dune reference. I approve.

lvh · on Feb 4, 2017

This looked really familiar to another Mozilla project called Datomish. The project was recently renamed to avoid confusion with Datomic; but apparently that _also_ involved a rewrite into Rust. Details here: https://github.com/mozilla/mentat/issues/133

Unfortunately, this means that the path for using this in non-Node Javascript environments (i.e. browsers) is unknown.

holygoat · on Feb 5, 2017

This is the renamed version of Datomish.

The original implementation was in ClojureScript. This reimplementation is in Rust, and is intended to work anywhere you can run Rust code: inside Node, inside Firefox, and in standalone applications.

We expect a WebExtensions API to wrap this inside Firefoxes at some point, but right now we're focused on the core (re)implementation.

samuell · on Feb 4, 2017

Awesome. Been looking hard for open source datalog-supporting data stores and data processing systems for a couple of years now. This is at least something in this direction, although I might ultimately wish for a full-fledged database or system that could run in a distributed fashion if needed.

espeed · on Feb 4, 2017

See also...

RDFox: "A highly scalable in-memory RDF triple store that supports shared memory parallel datalog reasoning. It is a cross-platform software written in C++ that comes with a Java wrapper allowing for an easy integration with any Java-based solution."

https://www.cs.ox.ac.uk/isg/tools/RDFox/

Dedalus: Datalog in Time and Space, by Peter Alvaro out of UC Berkeley (note the StrageLoop talk):

https://disorderlylabs.github.io/

Datalog -> Gremlin: It shouldn't be too hard to implement Datalog on top of the Gremlin Graph Virtual Machine so that Datalog compiles down to Gremlin bytecode -- SPARQL and SQL implementations already exist -- and running Datalog on the GVM would allow you to run Datalog on any datastore Apache Tinkerpop supports (all the graph DBs, HBase, Cassandra...):

Graph Computing with Apache TinkerPop, by Marko Rodriguez (the creator of Gremlin) https://www.youtube.com/watch?v=tLR-I53Gl9g

A Gremlin Implementation of the Gremlin Traversal Machine http://www.datastax.com/dev/blog/a-gremlin-implementation-of...

samuell · on Feb 11, 2017

That Peter Alvaro talk is on my top-three favourite talks, if not top-one :)

holygoat · on Feb 5, 2017

If you want a distributed Datalog store, it will be hard to beat Datomic. AIUI their current license terms are very affordable.

general_ai · on Feb 4, 2017

> designed for embedding

Yet implemented in Rust. Why? If you want adoption, the best way to design something "for embedding" is to write it in C.

holygoat · on Feb 5, 2017

It's designed to meet our embedding needs: in Firefox for desktops, Firefox for Android, Firefox for iOS, and Project Tofino (Node + Electron).

Widespread adoption a la SQLite is not one of our goals.

Rust meets our goals just fine… and it also produces demonstrably more correct software than C, which is important to us. (Not to mention leveraging the borrow checker and data race avoidance to provide safe near-automatic parallelization, which is a neat trick that's not in C's quiver.)

I would rather build Mentat in Swift or JavaScript than in C.

ianlevesque · on Feb 4, 2017

Rust embeds almost as well as C.

general_ai · on Feb 4, 2017

Be that as it may, it's a relatively obscure and quickly changing language, that _ends up calling into C_ anyway.

steveklabnik · on Feb 4, 2017

Rust doesn't change in a backwards incompatible way.

> that _ends up calling into C_ anyway.

What specifically do you mean here?

general_ai · on Feb 4, 2017

Says there it uses sqlite.

jdub · on Feb 5, 2017

Datapoint: GNOME's Federico Mena Quintero is working on a file-by-file port of librsvg to Rust. Compiled Rust objects are linked with compiled C objects. Both Rust and C call functions in librsvg's C dependencies, such as Cairo. None of this is weird; it's precisely what Rust was designed for.

holygoat · on Feb 5, 2017

Correct. That has no bearing at all on whether it's smart to write the rest of the library in Rust.

Most non-trivial applications will at some point call out into code written in C, or compiled from some other language. So what?

steveklabnik · on Feb 5, 2017

Ah I thought you meant Rust generally, not this project. Makes sense!

(I still don't think that's enough to justify using C over Rust here, just that I understand you now.)

biokoda · on Feb 5, 2017

Because a Rust library is safer, equally efficient, faster to develop, easier to build and can expose a C interface.