I'm going to use this opportunity to ask a question I've been thinking about for a long time. Why do we have both environment variables and command line arguments? They are the same thing, except one is key-to-value and one is positional and often needs to be parsed by hand in an ad-hoc fashion. I don't think that people should use command line arguments when environment variables are an option, and I'm not aware of any use cases where they are not an option.
Quickly, I can think of one reason - command line arguments override environment variables whenever conflicting options are present (programs should be written so that this is true). This gives flexibility to run a one-off command with specific options without having to reset and re-reset environment variables (of course, you can also deal with this by launching new shells that'll have the one-off environment variable settings without tampering parent shells).
There's a longer debate about this topic on Stackoverflow titled "Argument passing strategy - environment variables vs. command line". [1]
— It is possible to “see” a command line (in "ps" output, etc.) in ways that you won’t see environment settings. This can be useful when passing information to a sub-process that you need to keep private.
— It may be that you are configuring your sub-process in a place that is “far away” from the point that actually executes the command. Rather than have to thread an extra command-line argument through your code to make sure it is part of the final command invocation, it can be quite convenient to just set a variable. I have often used this to enable debugging features or test experimental features, or even to disable entire features when unexpected problems arise.
— In a similar way, your program may use multiple languages or otherwise be difficult to manage in any common way without environment variables.
— In a cross-platform scenario, environment variable names might be far easier to keep constant across UNIX, Windows, etc. than command-line syntax.
The main argument, to me, for command-line arguments is that they're not automatically inherited by child processes like environment variables are, so you don't have to rely on every process tidying up its environment before executing anything else. To me, that just seems like a recipe for heisenbugs and spooky-action-at-a-distance.
The POSIX shell has nothing to do with the syntax of the command. You can write a shell script that can parses
cp from=foo to=bar
if you wanted to. The shell expands metacharacters and variables, and sets up STDIN/STDOUT. But = is not a meta-character, so it is pased to the command unchanged. Several UNIX commands use that sort of syntax - like dd(1).
With the "env" command it’s possible to set environment using key=value for anything (it just means you have to say "env a=x b=y cmd -arg1 -arg2" instead of expecting "cmd a=x b=y -arg1 -arg2" to be valid).
You may already be aware of this, but it's trivial to read a process's environment variables, for any process running as the same user, or when root. They're exposed as a null-delimited text file in /proc/$pid/environ. You can even get ps to print the environment variable for you, if you use the 'e' flag (no leading dash). Depending on your actual security constraints, this may be important to be aware of. Of course, there are a variety of options for reading a process's arbitrary memory locations, so for actual security you need to control accesss to the host, but if you're worried about leaking 'ps' for command line arguments, you should be similarly aware about 'ps' showing environment variables.
Yep, that's why I mentioned "as the same user". It's slightly less of a risk of data exposure, but it's worth being aware of when evaluating your threat model.
A more interesting question is, why do we pass in command arguments and environment variables to a subprocess, but only get an integer status code back? The original reason comes from the way fork/exec was implemented in PDP-11 UNIX. But that was a while ago. At some point, the subprocess concept should have been extended to handle return values, like all other forms of function call. "exit" should have an argc/argv, which get passed back to the caller.
It's always bugged me that the POSIX exit status can only really communicate seven bits of information reliably. (The other half of the byte is overloaded with signal-exit information from the shell.) Windows does it better: there, you at least get 32 bits, which is enough for an HRESULT.
Also, I've never understood where this exit(-1) idiom comes from. It's nonsense.
The reason is of course that returning a variably-sized value to an already-existing address space is annoying. IOW that's what standard output (and pipes on /proc/self/fd/N) is for.
Interesting idea, and would make it more flexible, like Python and other languages which can return multiple values (from a called function to a calling function or the module scope, not to the OS, AFAIK).
Think of it like nested context in a "normal" programming language. The environment variables are state or data from the outside or encompassing context. Whereas command-line parameters are just that, parameters passed down to a function based on some logic held within the parent context.
Within that explanation, we all know that global/shared variables are a code-smell for the most part. Say you want to call the same command with different logic, or multiple times even.
result1 = func(); //How do we even know that func uses greeting and greetee variables?!
greetee = "another world";
result2= func(); //Did func change my greeting variable? I don't know.
So let's assume that greeting and greetee are actual important variables. You are essentially then sharing your "state" with the func in order to alter its behavior. I think in some shells, the functions themselves can alter global environment variables, so it would be a giant mess making sure that functions are idempotent and don't have artifacts.
Environment variable affect all instances; command-line arguments affect only one program instance. Set your defaults with environment variables (or a config file), then override them as needed with command-line flags.
In the Unix world, environment variables are passed to ALL processes spawned by the parent, including sub-processes. For example, if you log onto a computer, and your HOME variable is set, then every single process you launch will know your home directory, including processes that launch other processes. It's automatic UNLESS a process explicitly change this value. This does not use any sort of global registry. I used to be an admin of a VAX computer that had 50 simultaneous users logged onto the server, and each user had a different HOME directory.
Environment variables also made shell scripts reusable by other users. The file $HOME/special would refer to the "special" file in the user's home directory.
Command line arguments are only passed to the one single child process. And if that process wants to launch a new process, it must create it's own command line arguments.
Environment variables are inherited to child processes by default, so you can think of them as arguments to a whole process group, not just to the program you invoke at the top level.
parameters are lexically scoped, environment variables are dynamically scoped. Today dynamic scoping is frowned upon as it is an instance of spooky action at distance, but in the 70s I guess it wasn't that obvious (and environment variables probably predate unix).
Also dynamic scoping can be very powerful when stitching together pieces separately designed. To this day emacs lisp is still dynamically scoped by default and arguably it derives some of its power from it.
>.txt gets expanded by bash into a list of arguments. I don't know if this is the case in Windows.
(that should read star dot txt in the line above, not sure how to disable the italics meaning of star in posts)
I don't think it is the case in Windows, and this seems not to have changed since DOS days, when some programs would be abled to handle wildcards (internally) while others could not, because it was done by the individual programs, not the shell (COMMAND.COM or nowadays CMD.EXE).
A quick test:
$ python -V
Python 3.5.2
$ type test_arg_list.py
import sys
print(sys.argv)
$ python test_arg_list.py a t b
['test_arg_list.py', 'a', 't', 'b']
(that should read t star (not just t) in both the lines above)
So wildcards are not expanded. I'm sure there are Windows calls to expand them (there were from the DOS days, like FindFirst and FindNext (awkward approach, IMO), but your program has to actually use them for the expansion to work.
In fact, that is what I did, via the Python glob module, in this recent post:
Simple directory lister with multiple wildcard arguments:
Whereas, in Unix, the shell (at least sh / bash) does it automatically for all arguments for all command-line programs, before the program even sees the arguments. This is one of the (many) key benefits of the shell. In fact, all metacharacters are interpreted by the shell and/or the kernel, acting together. This includes redirections, piping, the many special symbols that start with $, backquotes, and many others.
I think I may have the answer (at least for Unix). (IIRC had read about this somewhere a while ago, also had thought about this issue myself earlier, so it's a combo of reading the reason somewhere and (maybe) figuring it out. Anyway, here it is:
It is because it allows 3 different ways of setting options for commands: rc files, environment variables (env. vars from now) and command line arguments, with each subsequent one able to override the previous one. The logic being that they go, in order, from more permanent to less permanent (as settings). rc files (rc stands for run command, a term I think I read Unix inherited from some previous OS) are config files for commands, like .exrc and .vimrc for vi/vim, .bashrc for bash, .netrc and many more. Any command can create or require users to create its own rc file, and can use it if present to read settings. The setting in a file is less easy to change on the fly than an env. var (not really difficult, of course, just that you have to go edit that file in an editor - or use sed etc.), and an env. var in turn is (a bit) less easy to change than a command line option, when we are talking about multiple different invocations of a command, in which you want the values for that option to be different in some of the invocations.
Let's take the example of a setting for a port (for a network server or client):
First, put the most common and permanent setting for the option, say, PORT=8080, in the rc file, say .foorc (for command foo - whether foo is built-in or written by you).
Second, for times when you want to change it for say today's work, set (i.e. change) it via an env. var, like:
export PORT=8181
foo args ...
# this setting will remain in effect until you change the var or you logout/reboot, and as long as it is present, will override any PORT value in .foorc each time you run foo.
It can also be shortened to:
PORT=8181 foo args ...
# but this is now a one-time setting of the env. var, so will override any PORT value in .foorc for this run only.
# In both the above variants, the args will not include PORT, since the foo command will be written to check for an env. var called PORT internally (and similarly checks for a PORT setting in .foorc before checking for an env. var called PORT, with the latter overriding the former if both are present).
And third, for the time(s) when you want to change the PORT setting on the fly, maybe just once for today, do:
foo --port 8282 args
which will override the settings for port (if any) in both the rc file and the env. var.
So the order is: command line option overrides env. var. and env. var overrides rc file setting.
This is what I read/figured out. It gives a lot of flexibility. Many Unix commands work that way. If you want your own to work that way, you have to write the code for it, like checking for presence of the rc file and for the setting in it, checking for the env. var with getenv() and finally checking for the command line option.