For someone who just wants to run some (intensive) OLAP graph queries on the “gr...

lmeyerov · on Jan 15, 2020

My default nowadays is minimize work via "no graph db": csv/parquet extract -> jupyter notebook of pandas/cugraph/graphistry, and if that isn't enough, then dockerized (=throwaway) neo4j , or if the env has it, spark+graphistry. The answers to some questions can easily switch the answer to say "kafka -> tigergraph/janusgraph/neptune", or some push button neo4j/cosmosdb stuff:

* Primary DB: type / scale, and how fresh do the extracts need to be (daily, last minute?)

* Are queries more search-centric ("entities 4 hops out") or analytics ("personalized pagerank")?

* Graph size: 10M relations, or 10B? Document heavy, or mostly ints & short strings?

* Is the client consuming the graph via a graph UI, or API-only?

* Licensing and $ cost restrictions?

* Push-button or inhouse-developer-managed?

The result of (valid) engineering trade-offs by graph db dev teams means that, currently, adding a graph db as a second system can be tricky. The above represent potential mismatches between source db / graph stack / workload and team burden. Feels like this needs a flow chart!

Happy to answer based on the above, and you can see why I'm curious which areas Nebula will help straddle :)

jamie-vesoft · on Jan 16, 2020

Very insightful answer! Thanks for sharing your opinions here. Nebula Graph is good at OLTP use cases where high QPS and low latency are required.