I don't see how this would work for command line tools (there are other applicat...

shalabhc · on Aug 24, 2017

If command lines were built in systems designed around this principle, they would work slightly different than what we're used to. For instance, when invoking this 'program' on the command line, the system would discover it needs an image and use the command line itself to ask for an image from the user.

Alternatively there could be standard way to pass an image (or any other input) to a program - similar to a command line arg in current systems, for instance.

irishsultan · on Aug 25, 2017

I don't see how this works with something like a recursive grep, or a find result sent to xargs for further processing.

shalabhc · on Aug 26, 2017

You would define rgrep as taking a stream of files. Generating the stream of files is outside the capability of any program (since we don't have `open` or even `listdir`). Instead you'd use a primitive of the storage system itself to define a stream of files that you pass to rgrep. Something like `rgrep(/my/path/, text)`. So it becomes impossible for any program to access a file without the user's explicit indication.

irishsultan · on Aug 26, 2017

That still doesn't help with a find + xargs combination, or with any kind of problem where you can currently store file names in a file and use that for later processing.

shalabhc · on Aug 26, 2017

You cant have a find program because you can't discover files, you must be provided them. But you can have a `filter` program, that takes a stream of files an outputs another stream of files matching a filter. You can then pipe the output of filter into another program.

Yes you cannot store filenames, but you could store some other serialized token generated from a file and the token could be used to recreate the file object. Alternatively, if you have an image based system, you don't have to convert the file object to a token explicitly - you just hold the references to the file objects and they're automatically persisted and loaded.

irishsultan · on Aug 27, 2017

Wait, so even listing file names needs permissions (otherwise find would work), even doing ls on the command line won't work?

shalabhc · on Aug 28, 2017

Correct - ls couldn't be a separate program - it would be replaced with a primitive that lets you explore the filesystem.

The point of such a system would be that programs cannot explore or read the filesystem as there is no filesystem API. But programs can operate on files explicitly given to them. So exploring the filesystem is restricted to some primitives that have to be used explicitly. The guarantee then is if I invoke a program without giving it a file or folder, I know it absolutely cannot access any file.

irishsultan · on Aug 28, 2017

Define "explicitly", because if it means that I can't just type it in a shell then that disqualifies it from being a practical solution, and if I can't put it in a function/script that I can call from a shell that disqualifies it as well.

But if I can do those things (especially the second), then that seems to open at least some attack vectors (that would obviously depend on the actual rules).

shalabhc · on Aug 29, 2017

You should be able to type it in a 'shell', and you should be able to set it on a function/program you call from the shell. But you cant download and run program that automatically references a file by path. This system is different enough a unix style system so I'll try and roughly describe some details (with some shortcuts) of how I imagine it. It is a messaging based data flow system (could be further refined, of course):

- The programs behave somewhat like classes - they define input and output 'slots' (akin to instance attributes). But they don't have access to a filesystem API (or potentially even other services, such as network). Programs can have multiple input and output slots.

- You can instantiate multiple instances of the program (just like multiple running processes for the same executable). Unlike running unix processes, instantiated programs can be persisted (and are by default) - it basically persists a reference to the program and references to the values for the input slots.

- When data is provided to the input slot of an instantiated program (lets call this data binding), the program produces output data in the output slot.

- You can build pipeline of programs by connecting the output slot of one program to the input slot of another. This is how you compose larger programs from smaller programs. This could even contain control and routing nodes so you can construct data flow graphs.

- Separately, there are some data stores, these could be filesystem style or relational or key/value.

The shell isn't a typical shell - it has the capability to compose programs and bind data. It also doesn't execute scripts at all - it can only be used interactively to compose and invoke the program graphs. A shell is bound to a data store - so it has access to the entire data store, but is only used interactively by an authenticated user.

So interactive invocation of a program may look something like this:

   >  /path/to/file1 | some_program | /path/to/file2
   # this invokes some_program, attaches file1 to the input slot, saves the output slot contents to file2.

You could save the instantiated program itself if you want.

   > some_program_for_file1 = [/path/to/file1 | some_program]

Then invoke it any number of times.

   > some_program_for_file1 | /path/to/file3  # runs some_program on existing contents
   (update file1 here...)
   > some_program_for_file1 | /path/to/file4  # runs some_program on new contents

With advanced filtering programs, you could define more complex sets of input files.

   > /path/to/folder | filter_program(age>10d, size<1M) | some_program | /path/to/output_folder

You can even persist the instantiated query, and reuse it

   > interesting_files = [/path/to/folder | filter_program(age<1d)]
   > interesting_files | program_one
   > interesting_files | program_two

So that's the rough idea, using an ad-hoc made up syntax for single input/output slot programs.