Redis, from the Ground Up

jemfinch · on Oct 18, 2010

"Lists, sets, etc. are more fundamental to computer scientists than relational database tables, columns, and rows."

That does not ring true to me. I'm curious what his data source for that claim is.

antirez · on Oct 18, 2010

If I'll ever meet Aliens I'm sure lists, sets, hash tables, and trees are in their CS books as well. But I bet their mainstream DB model may really be different than our relational one.

Edit: I reflected a bit more on the issue. It seems like that our mainstream DB model is clearly due to the kind of applications computers were mainly used for when the DB technology was developing: business application programs.

Imagine a DB technology emerging instead in completely different scenarios, like social applications where you need to update the status of users in a chronological way. Or a DB designed where most softwares had to deal with geo locations... as you can see the DB model is much more an ad-hoc affair.

A DB modeled after the fundamental data structures like Redis may not be the perfect fit for everything but should be able to model any kind of problem eventually without too much efforts, and with a very clear understanding of what will be needed in terms of storage and access time.

jemfinch · on Oct 18, 2010

> If I'll ever meet Aliens I'm sure lists, sets, hash tables, and trees are in their CS books as well. But I bet their mainstream DB model may really be different than our relational one.

I actually doubt that. The relational database model (relational algebra/calculus, tuples, etc.) is a mathematical model. I would expect aliens to have essentially the same model, actually, just like I'd expect them to have the same set theory we do. They're equally as basic.

antirez · on Oct 18, 2010

I don't agree: the relational model is a pretty specific and ad-hoc idea that can be mathematically modeled using trivial math structures. For sure they could understand it in a second and recognize this as part of their math, but will their mainstream database systems be modeled after the same model? I doubt it.

Instead I would expect to see lists, hashes, trees, and most of our sorting algorithms in their code as well.

bhiggins · on Oct 18, 2010

a table is just a set of tuples.

this is like saying 32 bits are more fundamental to computer science than the abstract idea of an integer.

silentbicycle · on Oct 18, 2010

Right. If you strip away terminology and other baggage (some of which is specific to MySQL, Oracle, etc. anyway) and look at just the relational model, it's actually pretty simple.

You have a relation (often called a "table"), which is a set of tuples (often called "rows"). Some of the fields ("columns") in a given tuple can be a key for that tuple, and some of them can be keys referencing other relations ("foreign keys"). The result of a query (such as getting all rows with a given value in a given column, joining one row to zero or more rows in another table via a foreign key, or the intersection of two relations) is another relation, and can be queried the same way.

It's tuples and set theory.

There's implementation details (and the implementation details for a high performance RDBMS are not trivial, don't get me wrong), indexing, transactions, constraints, etc. on top of that, but the core relational model itself is not very complicated.

mjrusso · on Oct 18, 2010

In your first CS courses, you learn how to build linked lists, sets, etc., and implement your very first algorithms using these data structures.

Relational databases come later, and many of the important concepts (set theory, tuples, etc.) build on knowledge learned by studying and implementing and using these initial data structures. It is in this way that I believe lists, sets, etc. are more fundamental than relational database tables, columns, and rows.

silentbicycle · on Oct 18, 2010

Some people learn set theory before data structures and algorithms. People focusing on math rather than computer science, for example. As long as we're speculating about hypothetical aliens, I wouldn't reject that possibility.

Arrays, linked lists, binary trees and mergesort could be more fundamental than the relational model, but the relational model in turn may be simpler than (say) red-black trees, BSPs, skiplists (depending on how the aliens view randomness), or OOP.

It's an interesting thought-experiment, though - if CS were being re-invented from the ground up, which things would be more fundamental and likely to be discovered first, especially with significantly different hardware?

antirez · on Oct 18, 2010

Well in a limited fashion this experiment could be actually be done. I'm not sure how this would be modeled, but I guess giving smart young guys that don't know nothing about programming a set of "tools" in a simulation that can be used to create the building blocks of many different data structures.

Then provide problems, and look at what they do and invent in order to solve such problems.

Will be very hard to do this in an unbiased fashion actually...

silentbicycle · on Oct 18, 2010

Absolutely. Hence, thought experiment and aliens.

Having programmers explore in a (to them) really weird language (like Prolog or Haskell) and seeing what they come up with might be a good approximation, though.

I may never have thought of difference lists, but then Prolog made them seem obvious. :)

joe_the_user · on Oct 18, 2010

Redis' internal design typically trades off memory for speed. For some workloads, there can be an order of magnitude difference between the raw number of bytes handed off to Redis to store, and the amount of memory that Redis uses.

What are the circumstances that make this kind of tradeoff worthwhile?

A generic Key-Value store, say Kyoto Cabinet, is pretty fast and you can configure its cache to be huge if you need it. Does reconstructing and using a list/set/hash take that much time?

Edit: Is the "order of magnitude" here greater or less than the extra space that keeping a b-tree index in memory would take? Is it doing something akin to that or a completely different thing?

antirez · on Oct 19, 2010

The tradeoff is especially worthwhile because we export complex data structures (I wrote a great deal of articles about this, please check the latest at antirez.com), but this time I'll try to provide a proof by paradox: what you are saying here is that memcached may be replaced with TC from the point of view of performances, if you add an LRU expiry, that I think it's not true.

joe_the_user · on Oct 19, 2010

I hope this didn't come off as a criticism. I was simply trying to understand what Redis does.

I'm using Kyoto cabinet to serialize various hashlist-based-classes and I'm wondering what bottlenecks etc. I might encounter are. I'll take a look at your site.