I am truly looking forward for CockroachDB to become the next PostgreSQL for planet-spanning database workloads. In our ecosystem we get more and more requests for CRDB integration.
Generally though, I would not say that full compatibility should even be desired. A k/v database simply works differently from a strictly relational database. There are things like shard IDs, avoiding hot spots, deciding on how to paginate data.
Going too much into "PostgreSQL" replacement will eventually hurt CRDB because too much focus will go into making legacy enterprise SQL (along the lines of https://news.ycombinator.com/item?id=25454635) work on this system. It's the forklift approach of moving to the cloud. This becomes clear when skimming through the docs:
I think many of the "light SQL" patterns are really great - things like SELECT or WHERE, which e.g. DynamoDB can simply not solve without ElasticSearch. But all in all, I am very excited for CRDB to get more industry acceptance, and I think their cloud offering could also become very interesting - a competitor to Google BigTable / CloudSpanner, AWS DynamoDB or Cloud SQL.
Sure, but this is an implementation detail. It is designed to be consistent in the style of and support the affordances of more traditional single-node RDBMS.
What are the salient differences in your mind? Under the hood, CockroachDB executes writes to and reads from such a table in the same way that you would against a key-value store. You can explore this for yourself with the "kv trace" functionality of CockroachDB's SQL shell, which logs of all of the KV API calls that a SQL query emits:
$ ./cockroach demo
# Welcome to the CockroachDB demo database!
#
# You are connected to a temporary, in-memory CockroachDB cluster of 1 node.
# ...
#
demo@127.0.0.1:26257/test> CREATE TABLE kv (k STRING PRIMARY KEY, v STRING);
CREATE TABLE
Time: 5ms total (execution 5ms / network 0ms)
demo@127.0.0.1:26257/test> \set auto_trace=on,kv
demo@127.0.0.1:26257/test> INSERT INTO kv VALUES('a', 'b');
INSERT 1
Time: 2ms total (execution 2ms / network 0ms)
timestamp | age | message | tag | location | operation | span
--------------------------------------+-----------------+--------------------------------------------------+------------------------------------------------------------+-----------------------------------------+------------------+-------
2020-12-17 23:17:46.626696+00:00:00 | 00:00:00.001123 | CPut /Table/53/1/"a"/0 -> /TUPLE/2:2:Bytes/b | [n1,client=127.0.0.1:49216,hostssl,user=demo] | sql/row/writer.go:207 | flow | 6
2020-12-17 23:17:46.626754+00:00:00 | 00:00:00.001181 | querying next range at /Table/53/1/"a"/0 | [n1,client=127.0.0.1:49216,hostssl,user=demo,txn=dcce3954] | kv/kvclient/kvcoord/range_iter.go:159 | dist sender send | 8
2020-12-17 23:17:46.626792+00:00:00 | 00:00:00.001219 | r36: sending batch 1 CPut, 1 EndTxn to (n1,s1):1 | [n1,client=127.0.0.1:49216,hostssl,user=demo,txn=dcce3954] | kv/kvclient/kvcoord/dist_sender.go:1851 | dist sender send | 8
2020-12-17 23:17:46.627281+00:00:00 | 00:00:00.001708 | fast path completed | [n1,client=127.0.0.1:49216,hostssl,user=demo] | sql/plan_node_to_row_source.go:145 | flow | 6
2020-12-17 23:17:46.627322+00:00:00 | 00:00:00.001749 | rows affected: 1 | [n1,client=127.0.0.1:49216,hostssl,user=demo] | sql/conn_executor_exec.go:622 | exec stmt | 4
(5 rows)
Time: 1ms total (execution 1ms / network 0ms)
demo@127.0.0.1:26257/test> SELECT * FROM kv WHERE k = 'a';
k | v
----+----
a | b
(1 row)
Time: 6ms total (execution 6ms / network 0ms)
timestamp | age | message | tag | location | operation | span
--------------------------------------+-----------------+----------------------------------------+------------------------------------------------------------+-----------------------------------------+------------------+-------
2020-12-17 23:17:54.402735+00:00:00 | 00:00:00.003116 | Scan /Table/53/1/"a"{-/#} | [n1,client=127.0.0.1:49216,hostssl,user=demo] | sql/row/kv_batch_fetcher.go:337 | materializer | 7
2020-12-17 23:17:54.402763+00:00:00 | 00:00:00.003144 | querying next range at /Table/53/1/"a" | [n1,client=127.0.0.1:49216,hostssl,user=demo,txn=d30bcbc9] | kv/kvclient/kvcoord/range_iter.go:159 | dist sender send | 9
2020-12-17 23:17:54.404565+00:00:00 | 00:00:00.004946 | r36: sending batch 1 Scan to (n1,s1):1 | [n1,client=127.0.0.1:49216,hostssl,user=demo,txn=d30bcbc9] | kv/kvclient/kvcoord/dist_sender.go:1851 | dist sender send | 9
2020-12-17 23:17:54.405091+00:00:00 | 00:00:00.005472 | fetched: /kv/primary/'a'/v -> /'b' | [n1,client=127.0.0.1:49216,hostssl,user=demo] | sql/colfetcher/cfetcher.go:888 | materializer | 7
2020-12-17 23:17:54.405895+00:00:00 | 00:00:00.006276 | rows affected: 1 | [n1,client=127.0.0.1:49216,hostssl,user=demo] | sql/conn_executor_exec.go:622 | exec stmt | 4
(5 rows)
Time: 1ms total (execution 1ms / network 0ms)
demo@127.0.0.1:26257/test>
I'll draw your attention to two lines in particular. Here's the put:
These operations (`CPut` and `Scan`) are KV operations that you'd be able to run yourself against any key-value store. CockroachDB doesn't give you access to those operations directly, but crafting your queries in this way is really not significantly different.
I think it’s the “under the hood” that’s the important part here. The salient difference is that calling it a key-value store eliminates the business reason for using it.
In that sense, all SQL databases are just KV stores. They compile SQL into actions on various internal KV stores. CRBD's approach just makes the separation more explicit than most since it's designed to run as a cluster.