For someone who just wants to run some (intensive) OLAP graph queries on the “graph formulation” of a relational or hierarchical dataset every once in a while (maybe batch, maybe user-initiated, but either way <1QPS), but doesn’t yet have a graph DB and doesn’t really want to maintain their data in a canonical graph formulation, which type of graph DB would you recommend as the simplest-to-maintain, simplest-to-scale “adjunct” to their existing infra?
I.e. what’s the graph DB that best fits the use-case equivalent to “having your data in an RDBMS and then running an indexer agent to feed ElasticSearch for searching”?
My default nowadays is minimize work via "no graph db": csv/parquet extract -> jupyter notebook of pandas/cugraph/graphistry, and if that isn't enough, then dockerized (=throwaway) neo4j , or if the env has it, spark+graphistry. The answers to some questions can easily switch the answer to say "kafka -> tigergraph/janusgraph/neptune", or some push button neo4j/cosmosdb stuff:
* Primary DB: type / scale, and how fresh do the extracts need to be (daily, last minute?)
* Are queries more search-centric ("entities 4 hops out") or analytics ("personalized pagerank")?
* Graph size: 10M relations, or 10B? Document heavy, or mostly ints & short strings?
* Is the client consuming the graph via a graph UI, or API-only?
* Licensing and $ cost restrictions?
* Push-button or inhouse-developer-managed?
The result of (valid) engineering trade-offs by graph db dev teams means that, currently, adding a graph db as a second system can be tricky. The above represent potential mismatches between source db / graph stack / workload and team burden. Feels like this needs a flow chart!
Happy to answer based on the above, and you can see why I'm curious which areas Nebula will help straddle :)
I.e. what’s the graph DB that best fits the use-case equivalent to “having your data in an RDBMS and then running an indexer agent to feed ElasticSearch for searching”?