Call me maybe: MongoDB (2013)

garretraziel · on July 5, 2014

I'm building small application using Node.js and MongoDB and I'm planning to host it on Openshift or Heroku. All that hate that MongoDB takes here on HN makes me reconsider technologies I am using.

I will not have many relations in my database (model User, model Document, User owns Document... and that's all) so I thought that NoSQL databases will do. Plus, MongoDB lets me use GridFS - I'm planning to store pdf presentations in it.

If I should drop MongoDB, what other technology should I use? Or should I fall back to Postgres + ORM and manage my files in filesystem manually?

I don't want to start a flame, I am looking for an advice. I have considered MongoDB to be "good enough" as GridFS lets me store my files without a hassle, but after all that I read on the Internet, now I am not so sure.

Pacabel · on July 5, 2014

It's not about "hate".

There's no emotion involved, in fact.

It's about rationally looking at a given tool, considering its capabilities, considering how it can be beneficial and harmful in a given situation, and considering how it compares to similar other tools.

Sometimes this analysis is in favor of a given tool. PostgreSQL is a good example of this. It excels in many different ways, including some that overlap with MongoDB and other so-called NoSQL databases.

Sometimes this analysis is not in favor of a given tool. MongoDB is a good example of this. It has some pretty serious issues, and there are often better alternatives.

When people suggest to not use MongoDB, it's generally not some emotional response due to "hate". It's usually because they've considered what it can do and how it works, and it turns out that in pretty much all cases there is a better alternative available that should be used instead.

barkingcat · on July 5, 2014

I don't think it's hate - it's a call for users to understand and examine the implications of assumptions that the developers of the tools have made.

Don't take the advertising taglines and slogans as a panacea - there are still a ton of things to understand about how mongodb might or might not work for your circumstances. The advertising buzzwords are just that - advertising. You need to examine each and every technology that you chose as part of your project.

And I totally recommend the entire Jepsen series from that site http://aphyr.com/tags/Jepsen - it's an eyeopener!

In my opinion, if you are starting some experiments, etc, just use postgres! It removes a lot of doubt and new learning/confusion from your development.

Rapzid · on July 5, 2014

PostgreSQL actually works quite well as a noSQl/document store :) I believe the best answer though is "it depends". I'm not sure this is the best place but if you gathered your requirements and listed them I'm sure a dstabse that will keep your data safe can be recommended.

jamesaguilar · on July 5, 2014

If it's a small app, just use Maria or Postgres. IMO, the only time to even think about using one of these non-traditional stores is when you've scaled past what SQL systems can offer. And even then, I'd use the non-traditional store like I use a C extension in Python -- only for the parts that can't scale without it.

threeseed · on July 5, 2014

Most of the MongoDB hate comes from the PostgreSQL crowd.

It's actually a fine database for many domain models i.e. lots of nested data and GridFS does work pretty well. That said don't use GridFS. Use something like S3 and reference the files.

And if you are planning to use MongoDB then look at MongoLab/MongoHQ. Anyone who says you should run your own database should NOT be listened to. Use a hosted solution if you are starting off small. You don't want to be spending your valuable time testing your backups (just one of the many operational activities most people don't do).

jamesaguilar · on July 5, 2014

> Most of the MongoDB hate comes from the PostgreSQL crowd.

Or anyone who doesn't like losing their customers' data.

reqres · on July 5, 2014

I was in a similar situation to you when MongoDB was a lot less mature. Postgres was my "go to" store and I was a very skeptical of the Mongo hype.

My view would be to use Mongo if it's just a small project. It's a good time to learn exactly how Mongo is different and get a first hand experience of what the trade-offs are.

stavros · on July 5, 2014

I've tried to go non-relational for things that required a few relations and keeping things consistent was a pain. Owners would drop and documents referring to them would still be in the database, causing problems. I'd go with Postgres until you know it's not enough (and I do, nowadays).

sandstrom · on July 5, 2014

I've used MongoDB quite a bit, and find it convenient. That said, it also has flaws (many has been pointed out on HN and elsewhere).

It's schemaless and malleable, easy to restructure the data. Fairly easy to operate, replication is easy to setup.

Don't use GridFS though, I'd go for S3 or similar instead.

stavros · on July 5, 2014

Is there any datastore in this series that behaved correctly in a partition? I've seen ElasticSearch, Redis, Riak, Mongo, and all of them crapped their pants.

jamesaguilar · on July 5, 2014

So far as I can tell, no traditional SQL systems have been tested. I'd lay a small bet that Postgres wouldn't lose data, although I don't even know if it has any kind of masterpool capability.

stavros · on July 5, 2014

Nah, he already tested Postgres: http://aphyr.com/posts/282-call-me-maybe-postgres

twic · on July 5, 2014

It's not a very thorough test compared to many of the others. I would guess because PostgreSQL doesn't support multi-master clustering.

However, in the test that was done, PostgreSQL did not lose data. The failure mode was that transactions were committed without the client being notified. Kind of the opposite of data loss - there is data in the database that the client doesn't know about.

Rapzid · on July 6, 2014

I believe you can use prepared transactios and some client smarts to mitigate this depending on your use case . There is some back and forth for sure, and the risk of locking the database is higher.

nemothekid · on July 5, 2014

Cassandra, Zookeeper and Kafka didn't shit the bed.

dbenhur · on July 5, 2014

On Cassandra, some CQL collection operations behaved well (adding elements to a set). Everything else Kyle tested there was demonstrated to lose data.

Kafka lost data all over the place under Jepson; though Kyle offered great respect to the team and expects them to deliver optionally configured safer semantics (at a performance cost) in future releases.

Riak was totally solid when using CRDTs and turning off the insane LWW default.

tveita · on July 5, 2014

Kafka did lose a lot of data in the Jepsen test, as it will by design to preserve availability under some specific circumstances.

The behavior in question will be configurable from Kafka 0.8.2, by setting "unclean.leader.election.enable" to false.

See https://issues.apache.org/jira/browse/KAFKA-1028

zardosht · on July 5, 2014

Another thing to keep in mind is that not all systems are meant to be CP systems.

preetamjinka · on July 5, 2014

Zookeeper? Not in the same category as MongoDB, of course.

dbenhur · on July 5, 2014

Never observed shitting the bed -- Zookeeper: http://aphyr.com/posts/291-call-me-maybe-zookeeper

"As with any experiment, we can only disconfirm hypotheses. This test demonstrates that in the presence of a partition and leader election, Zookeeper is able to maintain the linearizability invariant.

Recommendations

Use Zookeeper. It’s mature, well-designed, and battle-tested."

Doesn't Shit the bed when used correctly -- Riak: http://aphyr.com/posts/285-call-me-maybe-riak

"Is there no hope? Is there anything we can do to preserve my writes in Riak?

Yes. We can use CRDTs.

If we enable allow-mult in Riak, the vector clock algorithms will present both versions to the client. We can combine those objects together using a merge function.

... CRDTs preserve 100% of our writes. ... Moreover, CRDTs are an AP design: we can write safely and consistently even when the cluster is totally partitioned–for example, when no majority exists. ...

Strategies for working with Riak

Enable allow-mult. Use CRDTs."

There exist methods where by it wont shit the bed -- Cassandra: http://aphyr.com/posts/294-call-me-maybe-cassandra/

"CQL and CRDTs

Without vector clocks, Cassandra can’t safely change a cell–but writing immutable data is safe. Consequently, Cassandra has evolved around those constraints, allowing you to efficiently journal thousands of cells to a single row, and to retrieve them in sorted order. Instead of modifying a cell, you write each distinct change to its own UUID-keyed cell.

Cassandra’s query language, CQL, provides some collection-oriented data structures around this model: sets, lists, maps, and so forth. They’re CRDTs, though the semantics don’t align with what you’ll find in the INRIA paper–no G-sets, 2P-sets, OR-sets, etc. However, some operations are safe–for instance, adding elements to a CQL set: All 2000 writes succeeded. :-D

That’s terrific! This is the same behavior we saw with G-sets in Riak. However, not all CQL collection operations are intuitively correct. In particular, I’d be wary of the index-based operations for lists, updating elements in a map, and any type of deletions. Deletes are implemented by writing special tombstone cells, which declare a range of other cells to be ignored. Because Cassandra doesn’t use techniques like OR-sets, you can potentially delete records that haven’t been seen yet–even delete writes from the future. Cassandra users jokingly refer to this behavior as “doomstones”.

The important thing to remember is that because there are no ordering constraints on writes, one’s merge function must still be associative and commutative. Just as we saw with Riak, AP systems require you to reason about order-free data structures. In fact, Cassandra and Riak are (almost) formally equivalent in their consistency semantics–the primary differences are in the granularity of updates, in garbage collection/history compaction, and in performance.

Bottom line: CQL collections are a great idea, and you should use them! Read the specs carefully to figure out whether CQL operations meet your needs, and if they don’t, you can always write your own CRDTs on top of wide rows yourself."

brickcap · on July 5, 2014

I was hoping that aphyr would cover couchdb at some point. But since he didn't I assume he tried it and was satisfied with it.

But it might be that it is an ap system and there was nothing much to say there.

olegp · on July 5, 2014

For all the hate MongoDB gets, I have to say building queries in server side JS using MongoDB JSON syntax rather than SQL style queries is the way to go.

Check out these two examples:

https://github.com/olegp/stick-blog-pg/blob/master/lib/serve... - uses Postgres https://github.com/olegp/stick-blog/blob/master/lib/server.j... - uses MongoDB - much easier to construct dynamic queries using the JSON syntax

There are of course plenty of other things a relational DB like Postgres has going for it, so I've been experimenting with having a MongoDB interface to Postgres with data stored using the JSON datatype. It's now feature complete and passes all unit tests with read performance exceeding that of Mongo in some benchmarks:

https://github.com/olegp/pg-mongo

Pacabel · on July 5, 2014

That's not a very convincing example.

Let's ignore the fact that it's a trivially small example to begin with. I don't think it's very representative of larger apps, especially ones that are far more complex than a very simplistic blog system.

And let's ignore the fact that the SQL-based one is around 10% shorter than the MongoDB one, too, in terms of their line counts.

The SQL-based one makes the queries very clearly visible. It's easier to find the queries, and for anyone who knows SQL it's very clear at a quick glance what they're doing. It's much harder to isolate the queries in the MongoDB one. The visibility of the SQL queries makes maintenance easier, and performance tuning easier.

And I don't buy the dynamic query argument. Those are often the most fragile ones used, and should generally be avoided, if possible. It's better to have two or more similar queries than it is one that's built dynamically. Even then, it's often possible to write a single query with more complex filtering that can handle numerous very different cases.

At best, the MongoDB example is roughly equivalent to the SQL one. In practice, you're trading off a huge number of benefits of relational databases (SQL usually being one of them) for little to no gain when opting for MongoDB instead.

olegp · on July 6, 2014

Thanks for checking out the example and I agree that it may be overly trivial. We do plan to investigate this further and will blog about it at https://starthq.com/blog/

I really think that depending on one's requirements it should be possible to query relational database using MongoDB syntax and use (basic) relational queries on Mongo data.

There have been tools like Metrica (https://starthq.com/apps/metrica) that do this, but we're working on implementing that as a Node package.

buster · on July 5, 2014

It's probably nice if you only write JS (say, you write a node.js app). To me SQL is still the best query language we have. It's usable amongst a wide range of database servers, from tiny embedded sqlite to oracle. It's _easy_.

One side that NoSQL fails on is to not have a common query language, imo. You chose MongoDB in your project and want to migrate to another database? Not that easy. Switch from MySQL to Postgres? Probably not a drop-in replacement, but much easier to do.

twic · on July 6, 2014

> To me SQL is still the best query language we have.

OQL! Okay, so nobody ever implemented OQL. But there are OQL-inspired query languages in production which i prefer to SQL for routine use, such as JPQL.

One reason for that preference is the ability to join through foreign keys with a syntax which resembles property access on objects:

  select e
  from Employee e
  where e.department.head.manager.level = 'VP'

This beats the equivalent SQL:

  select e.*
  from Employee e
  join Department d using (department_id)
  join Employee h on d.head_id = h.employee_id
  join Employee m on h.manager_id = m.employee_id
  where m.level = 'VP'

Admittedly, whilst JQPL is nice for this sort of routine fetch-and-filter stuff, it lacks the more powerful features of SQL like window functions, recursive common table expressions, etc. I don't often need those, but when i do, it would be rather painful to do without them.

brickcap · on July 5, 2014

>To me SQL is still the best query language we have

I agree 100% there. I think orient db proves that it is possible to use sql in a no sql database. Though I think not all databases are designed to support entire sql querying mechanism.

goldenkey · on July 5, 2014

Your comparison is not apt; an ORM can be used for any SQL flavor.

olegp · on July 5, 2014

Having written an ORM myself back in 2000 and used Hibernate and JPA for many years, I'm now of the opinion that ORMs are an unnecessary, leaky abstraction in the modern day of dynamic languages.

One is better off talking directly to the data store and effectively streaming data out via a REST API.

rspeer · on July 5, 2014

Why would someone downvote this?

ORMs certainly are a leaky abstraction, and they certainly do come with a penalty in performance and expressiveness. Is it really downvote-worthy to add the opinion that they are unnecessary?

lucian1900 · on July 5, 2014

There are SQL libraries out there that make writing raw SQL unnecessary and also provide you with all the semantics of actual SQL, with no performance or expressiveness overhead. SQLAlchemy for Python is an example, there are others.

olegp · on July 6, 2014

Yes, at one startup I worked on we had written one of those as well (https://github.com/akshell/docs/blob/master/guide/db.rst).

The problem is they are still an abstraction. You write these queries using chaining or whatever but when something goes wrong you still need to look at the generated SQL statements.

With MongoDB, the query you write in your app is exactly the same as the query you run from the MongoDB shell and I like that. I can prototype the query in the shell and then copy paste it into the application code. When the application code is producing unexpected results, I can easily debug it in the shell.

noelwelsh · on July 5, 2014

You're comparing concatenating strings vs building data structures. Naturally the latter is preferable. It's fairly trivial to represent SQL as data structures in your host language (an embedded DSL if you like) and then you have the benefits of compositionality you currently have with JSON.

lucian1900 · on July 5, 2014

That's why there are nice libraries for building SQL ASTs and generating code for various databases, like SQLAlchemy. They also benefit from not being ambiguous and arbitrarily constrained (like Mongo's joke query language).

_JamesA_ · on July 5, 2014

Does anyone have experience/comparisons with OrientDB? http://www.orientechnologies.com/orientdb/

It seems to be much lesser known but it ticks all the right boxes compared to other NoSQL datastores.

Gonzih · on July 5, 2014

With rise of RethinkDB it would be lovely to see similar post on it.

hopeless · on July 5, 2014

I've looked into RethinkDB and I really like it but… you have to migrate your data between each 1.x -> 1.y release which might be trivial early on but impossible at a larger scale :-/

http://rethinkdb.com/stability/

neumino · on July 5, 2014

This will not be needed for the next releases (if everything goes as planned) https://github.com/rethinkdb/rethinkdb/issues/1010#issuecomm...

hopeless · on July 5, 2014

Wow! That's a huge leap forward. I can look seriously at RethinkDB again for a new project. Is there a public roadmap where I could have seen that coming?

fwr · on July 5, 2014

You missed an opportunity to say that you need to rethink using it.

ulisesrmzroche · on July 4, 2014

This is ancient stuff though, is this still relevant today?

leif · on July 5, 2014

Yes, there are still problems with the election protocol, e.g. [1]. The right kind of network partitions can cause multiple primaries to stay up indefinitely, accepting writes on both sides of the partition, which will eventually be rolled back. There is another problem with the election protocol that allows writes acknowledged by a majority of machines to be rolled back after an election.

Both of these problems can be fixed by using something like Raft[2] or Paxos for elections, rather than the ad hoc mechanisms used today.

In TokuMX[3], we're currently working on replacing the election algorithm with something similar to Raft, that will eliminate these sources of data loss. We've heard that MongoDB is also working on fixing replication, but we don't know what their exact plans are (they have a bigger challenge since they need to stay compatible with their existing replication algorithms, which use timestamps as transaction identifiers) or whether these fixes will end up in 2.8 or in a later version.

[1]: https://jira.mongodb.org/browse/SERVER-9848

[2]: https://ramcloud.stanford.edu/wiki/download/attachments/1137...

[3]: http://docs.tokutek.com/tokumx

ericingram · on July 5, 2014

Once again, a TokuMX engineer steps up to explain issue and offer a potential solution. I can't help but wonder why MongoDB engineers aren't doing this. But no matter, just glad we're using TokuMX.

nevi-me · on July 5, 2014

I think if I was working from someone's codebase, with about 80-90% of what I need already built-in, I would have the time and resources to make improvements on my fork, and shine glorious over the software making the base of my product.

I wanted to try TokuMX months ago, but when I learnt that the version at the time was based on Mongo 2.2 I shied away from it, because I need GeoJSON capabilities. I remember that with 2.6 one of TokuTek's engineers said that they needed to look at Mongo's code and start playing catch-up, I don't know if they've done that so far.

What will Mongo 2.8 mean for TokuMX? We're seeing document-level locking, possible B-Tree improvements (I presume Toku's R-Tree/Fractals [can't remember which they use] will still be superior), possible transactions (although what's on JIRA hasn't convinced me so far) and a few other improvements and Performance Boosting Things. So to what scale with Toku remain relevant if they don't keep up to date with Mongo, because in my case, using their versions based on 2.2, their ideology of being 'a drop-in replacement for MongoDB' doesn't work.

I'll go to their Github page and try see whether they've merged the 2.6 codebase to their latest versions though :) EDIT: from looking at their release changelogs, as of October last year, they were in parity with Mongo 2.4, with the exception of geo-indices and full-text search, and 2.6 is still an open milestone.

It kind of feels like the Joyent vs Strongloop thing on Node.js, but I wonder if TokuTek employees push bug-fixes upstream to Mongo, or whether they just fix them on TokuMX and use that as a selling-point; again with this I'll have to do some digging to inform my opinion, but I'd appreciate if someone who knows could clarify it.

leif · on July 5, 2014

My other reply didn't address some of your specific questions about 2.8:

> We're seeing document-level locking

We've had it from the beginning. Their implementation so far doesn't handle index updates or replication. I assume they'll handle these issues before a GA release, but the interesting question is which workloads will still demonstrate good concurrency after they solve these problems.

> possible B-Tree improvements (I presume Toku's R-Tree/Fractals [can't remember which they use] will still be superior)

I haven't seen any actual improvements they've got planned. Besides, B-trees will never compete with our fractal trees on insertions or compression, or for that matter, with LSM trees either.

> possible transactions (although what's on JIRA hasn't convinced me so far)

They aren't going to do transactions in 2.8. They may provide something like some transactional semantics we provide in TokuMX after 2.8 (I've heard mentions of single-shard atomicity), but by this point we have even bigger and better things than just single-shard transactions planned.

> and a few other improvements and Performance Boosting Things.

Not sure what you mean. The coolest things I've heard are not storage related, e.g. filtered replication. They're definitely exciting, but unrelated enough that we should be able to just merge them wholesale.

> So to what scale with Toku remain relevant if they don't keep up to date with Mongo

We'll keep up, don't worry. Here's hoping we maintain---and gain---relevance. ;-)

zardosht · on July 5, 2014

Another engineer at Tokutek here. As you see, we are up to 2.4, and have been investigating 2.6 and Geo. With all possible features, whether they be from MongoDB 2.6 or things we innovate on our own like partitioned collections, we prioritize and address them based on customer and user feedback.

Also, 2.6 is not an all or nothing proposition that needs to be done in one release. Features with the most demand (whether it be the new write commands or aggregation framework improvements) will be done before others. We've done this before. When we released 1.0 that was based on 2.2, we also released hash based sharding with it which was a 2.4 feature. We did so because users demanded it.

As for pushing bug fixes upstream, we file bugs when we see them. Our VP of engineering was a winner in the MongoDB 2.6 bug hunt with SERVER-12878. SERVER-9848 and SERVER-14382 are among the bugs I've filed.

nevi-me · on July 5, 2014

Thanks for the response, I read a post on the mongo-user group , and that's what I noticed, that a number of features are ported as and when necessary. Don't read what I say in a very negative sense, because I'm mostly curious, and it's my opinion that sometimes the little that we (I) get exposed to regarding TokuMX specifically is that it's superior to Mongo, that it's a "choose us or lose out" thing, but that happens when one doesn't follow a certain topic, but only sees it being mentioned here and there (understandable since Mongo has been the subject of "my start-up failed, and I blame it on Mongo; so burn Mongo" kind of discussions).

One more question if you don't mind: since MongoDB will support various storage engines from 2.8, including Tokutek's storage engine (can't remember its name); notwithstanding other innovations on TokuMX, would switching from mmap to Tokutek's storage engine mean that one ends up with Mongo having geo-indices and other bells, while having TokuMX's main feature?

zardosht · on July 5, 2014

Your last question is a bit loaded with a bunch of "ifs", so let's unwind it. I don't know what MongoDB will "support" as far as other engines go. But assuming we, Tokutek, release something that we support that is our engine plugged into 2.8 using MongoDB's storage engine plugin, then according to the design we heard about at MongoDBWorld, that product will be what you think it is: Mongo with geo and "other bells", and TokuMX's compression + write performance.

But 2.8 is a bit away and the storage engine API is a very fresh development. I don't think anyone is in a position to be able to really guarantee what it would look like and how TokuFT (https://github.com/Tokutek/ft-index/) will plug into it. I definitely cannot make any promises.

If you are interested in TokuMX + some missing features from MongoDB (sounds like geo), and don't mind discussing your needs and use cases with our sales guys, please give us feedback at http://www.tokutek.com/contact/. As I mentioned previously, user feedback drives what we do, so at the very least, you can provide some additional data points.

ericingram · on July 5, 2014

We didn't need GEO indexing but what Toku does offer is pretty exciting. Primary wins for us include multi-query transactions, compression, fractal tree indexes (thus overall insert and query performance), and clustering indexes.

cheald · on July 5, 2014

> What will Mongo 2.8 mean for TokuMX? We're seeing document-level locking, possible B-Tree improvements (I presume Toku's R-Tree/Fractals [can't remember which they use] will still be superior), possible transactions (although what's on JIRA hasn't convinced me so far) and a few other improvements and Performance Boosting Things.

TokuMX is quite a bit ahead of MongoDB in those respects - it already has document-level locking, transactions, MVCC guarantees, and partitioned collections (a recent addition, and awesome for time series data). It also offers (configurable) data compression and a non-fragmenting storage mechanism, which means tremendous disk usage savings in update-heavy workloads. It's quite a lot faster both reading and writing in many cases (particularly when you can take advantage of clustering keys). It's not without its faults, but the things it has over vanilla MongoDB far outweigh its faults, IMO.

It would be nice to have geo-indices and full-text search (we use ElasticSearch for the latter in concert with TokuMX, and it supports geoindices too), but "server doesn't fall over under high load" is a lot more important to us (as users, I don't work for Tokutek). 2.8 promises to bring MongoDB up to speed in a lot of respects, but it's definitely not accurate to think of TokuMX as "MongoDB 2.4 except with faster writes".

leif · on July 5, 2014

I can't add much to what Chris and Zardosht already said, but let me reiterate a few things regarding our fork:

1. You're a bit out of date. We merged changes to catch up to 2.4 in about a month (once we decided 2.4.x was stable). The current plan is the same for 2.6. We're currently working on it. If you need the latest and greatest Mongo features, stick with basic MongoDB. If you're willing to suffer a bit of lag (on the order of months) to receive our benefits, we're here if we can help.

2. Geo is a known issue. At the moment it doesn't seem like it's that widely used, so it's not a very high priority. However, we know some people want it and we will eventually get to it. Hopefully with a better implementation.

3. MongoDB's full-text search capabilities are, as far as I can tell, far behind what's provided by the state of the art text search systems, and serious users currently use MongoDB/TokuMX in concert with more focused solutions like Solr/Lucene/Elastic Search. I haven't spoken to anyone invested in text search that actually used MongoDB's text indexes, even if they use MongoDB elsewhere in their application. If you do, I'd love to buy you lunch and talk about it, please email me (my username here at tokutek.com).

4. Here's the big takeaway I got from last week's conference: MongoDB has been convinced that many of the problems we solve with TokuMX (performance, compression, concurrency, transactions) are important to their biggest users. Their most hyped announcements and plans for 2.8---document-level locking and the storage engine API---are aimed straight at us. We see this as a resounding validation of our technology, and a wonderful challenge to continue improving TokuMX. While it's tantalizing to implement a fractal tree storage engine according to their API (and there's no doubt that we will implement one), our innovations in TokuMX proper run deeper, into extra collection types, replication and sharding internals, and we have further plans for TokuMX that are beyond the scope of a storage engine API. The availability of the API is an opportunity for us to create a product with some of our improvements (mainly insertion performance and compression) with better compatibility (esp. w.r.t. replication and geo/full-text) and a simpler upgrade path. However, TokuMX as it exists as its own product (with better replication, sharding, and advanced features like clustering indexes and partitioned collections) is not going away, and will continue to see aggressive innovation as it will always lead a product built from MongoDB's storage engine API in terms of advanced features like clustering indexes and shard-aware transactions.

onedev · on July 4, 2014

People still use MongoDB in 2014?

kiyoto · on July 5, 2014

Enough people use it to have the company valued above a billion dollars, have their first MongoDB World (http://world.mongodb.com) and have a HN darling like Stripe as a power user.

Sure, MongoDB has its shortcomings, but so does <Someone's favorite database>. It's much more productive to understand the pros and cons of using it and use it (or not user it) appropriately.

xtrumanx · on July 5, 2014

> Sure, MongoDB has its shortcomings, but so does <Someone's favorite database>.

I've been learning Postgres lately and have been liking it so far. What are its shortcomings? Haven't run into something bad about it yet.

twic · on July 5, 2014

No multi-master replication. You can have many read-only slaves replicating a single master, but all your writes have to go to the master, and if the master goes down, you can't write at all until one of the slaves is promoted to become a new master.

There are third-party tools which add multi-master replication, but all seem to suffer from some combination of complexity, flakiness, project deadness, or project immaturity. Postgres-XC seemed the best last time i looked, but it does seem quite complex.

Work is currently underway to bring multi-master replication into core PostgreSQL, under the name "bi-directional replication":

https://wiki.postgresql.org/wiki/BDR_Project

I have no idea if or how this interacts with Postgres-XC. I would imagine the end-game will be that by ~2016, there will be both tightly-coupled (synchronous, immediately consistent, like Postgres-XC) and loosely-coupled (asynchronous, eventually consistent, like BDR) multi-master replication in core PostgreSQL.

That's a long time to wait given that there are noSQL and newSQL databases which have that right now, but when it arrives, it will bring PostgreSQL's huge baggage train of other good stuff with it.

sanswork · on July 5, 2014

Complexity.

Though it has definitely been improving over the years.

aioprisan · on July 4, 2014

really, what's the point of that comment?

adamors · on July 5, 2014

For what it's worth, I think it an interesting question. Considering just how hyped Mongo was a few years ago, and how many other NoSQL databases (CouchDB, Riak, Cassandra perhaps even Postgres with its JSON support) gained mind share since then.

onedev · on July 5, 2014

Yeah Mongo was "The Flavor Du Jour" just a couple years ago, with MongoDB articles hitting frontage like every week. Now we don't ever hear about it and people don't really use it much either.

lwhalen · on July 5, 2014

People don't use it because it is terrible. I had to manage a 5-node Mongo cluster as recently as 2 years ago, and I still drink heavily because of it. It was a data roach-motel - your documents check in, but they never check out!!!

Everything ran fine, until it was time to fail over - and the company had a 'full failover every quarter' commitment. Every single time we went from the 2-node primary site to the 3-node secondary, all he would break loose. Bad writes, lost partitions, it was a new 'thing' every quarter. Mongo, in my limited experience, was a terrible platform and I recommend everyone who is thinking of implementing it to run screaming. It accepts data SUPER fast, but so does /dev/null.

Rapzid · on July 5, 2014

I haven't personally used it but I believe there is (was) enough factual technical information to reach this conclusion, including this linked blog post of Kyle's. I was shocked when I read that. Data loss is generally not okay and expected. I'd say most were not aware that the potential for data loss was actually in the design of these products before the Jepsen posts. I believe these flaws impacted most of what people were using it for, so its continued use, particularly on new projects, in the presence of other databases such as Cassandra and Riak is surprising.

And this is not even factoring reports of slow progress, in your face bugs, and PITA manageability.

While it us true substantial progress may have occurred in the less than two years since that test, I would need to see some solid evidence of substantial improvement to consider mongoDB.

That's my outlook based on what I have read.

gaius · on July 5, 2014

This is where the MongoDB guys are very sly. MongoDB works brilliantly if you are one developer on one desktop working on one app - you don't need a DBA or any other ops guys, you can just develop away to your little heart's content!

So they write their app and chuck it over the wall to ops, and by then it's a fait accompli, the ops guys know they will look very bad to management if they say "WTF is this?". So they try to make it more-or-less work, then come on HN to share war stories.

nevi-me · on July 5, 2014

Not advocating MongoDB, but I think if we base our opinions solely on how things were 2 years ago, we would end up still basing our opinions on how things were 10 years ago. Software breaks, 'poor' design decisions are made, etc, but at the same time, 2 years is a lot of time for improvements to be made.

My HTML5 app failed me 3 years ago, I've now been recommending that everyone stay away from HTML5. /s

jamesaguilar · on July 5, 2014

Maybe it's just me, but I don't know if I could ever trust an engineering culture that thought succeed-without-acknowledgement was a good default. At least not as a primary source of truth system. There are plenty of databases that take data seriously and have from the beginning. I'd probably just use one of those.

jasonmccay · on July 6, 2014

You realize that was resolved 1.5 years ago, right? Also, it was a driver-level implementation. Comments like this make you sound more troll and less informative.

jasonmccay · on July 6, 2014

"Every single time we went from the 2-node primary site to the 3-node secondary" ... Can you explain what this means in terms of MongoDB?

wheaties · on July 5, 2014

People still use it. Hell, some people love it. I was using it about a year ago as a primary caching layer at the company I worked for then. Given a choice I would have gladly picked something else. Damn near anything else actually. Hope its improved since then.

gaius · on July 5, 2014

The one use case MongoDB might be useful for... memcached already did better a decade ago.

ulisesrmzroche · on July 5, 2014

Yup. Super big names too like Verizon, Forbes, and the British Government too. So...

gaius · on July 5, 2014

I'll wager all of those have more COBOL than they do any other technology. Probably 1000x more than they're running on MongoDB.

poolpool · on July 5, 2014

Having one small team in a 6 figure sized organization using mongo doesn't mean a single thing, but makes your statement true, which is how most of these flagship customers are able to be advertised.