Okay- just spent the whole day tinkering wit this:
1) I create a baseline set of views I want my customers to have
2) For each new customer, I’ll run a script that create a replica of those views- filtered by their customer ID
3) I’ll allow my customers to write pure SQL- limiting them to only SELECT queries and a couple niche business rules, as well as masking any DB-level errors, because that just feels wrong
I think the main thing you're missing is creating an account in the DB that only has access to those views, so for each customer you'd do something like:
CREATE USER customer_xyz WITH PASSWORD 'foo';
CREATE VIEW customer_xyz_data AS SELECT * FROM data_stuff WHERE customer_id=x;
GRANT SELECT ON customer_xyz_data TO customer_xyz;
So then two things are happening, SELECT-only is being enforced by the view itself no matter what, and their account is categorically unable to touch anything outside of that view too, so as long as you run their queries through that account it will always be sandboxed.
You can enforce all of that yourself but ultimately if they're using an account that can read/write other tables you will always have to be careful to make sure you are sanitizing their input not just to selecting but like, limiting joins and nested queries too.
Gotcha. Yeah- I was thinking of working with my engineers to figure out a permissions layer, but I understand enforcing that at the DB-level would guarantee security.
Dumb question- is creating a set of Views for each customer even efficient for my MySQL database? I could realistically see us having ~12 customer-facing views- is having 12*N views a smart and scalable way to architect this?
A view is just a query that pretends to be a table, so it will come down to the complexity of that query. Each time you're querying the view it will be running the combination of the user's query against the view's query so the performance comes down to whether your DB is optimized around basically "SELECT field1, field2, field3 FROM (SELECT * FROM data_stuff WHERE customer_id=x)". Whether you execute that query as a view or as ad-hoc SQL doesn't make a difference itself.
"Your side" of this can be optimized easily enough, but the user-submitted queries are likely to be inefficient or miss indexes, which is why one database per customer can be better since they each have their own resources.
You can create the views and accounts as needed and destroy them when sessions end rather than keeping them permanently too, so when the user signs in you create the view and account, after the session or some period of inactivity you remove them.
Makes sense. The fact that my SQL Editor puts tables and views in the same section on its left sidebar was the main reason I did a double-take.
The idea of deleting and recreating views is an interesting one. I see that as a really cool approach- considering we can go without it as a v1 then include it as we scale.
Thank you for all your advice so far! This has been truly helpful.
1) I create a baseline set of views I want my customers to have 2) For each new customer, I’ll run a script that create a replica of those views- filtered by their customer ID 3) I’ll allow my customers to write pure SQL- limiting them to only SELECT queries and a couple niche business rules, as well as masking any DB-level errors, because that just feels wrong
How does that approach sound?