> This feature is documented under the -f option of the export built-in command. The implementation detail of using an environment variable whose value starts with "() {" and which may contain further commands after the function definition is not documented, but could still be considered a feature.
This undocumented implementation detail is also a limitation on the use of regular environment variables, and should be documented. When reading documentation about a mechanism, I expect that special magical strings which change behaviour of the mechanism are clearly documented. If such documentation had existed, someone might have noticed it and guarded against it.
> Assumedly programs like apache filter out environment variables properly. But unfortunately, in the validation of input data, they fails to validate correctly input data because they don't expect that data starting with "() {" will be interpreted by their bash child processes. If there's a bug, it's not in bash, but in apache and the other internet facing programs that call bash without properly validating and controlling the data they pass to bash.
It isn't easy to validate and control data against an unknown magical feature in one of many possible shells.
> But on the other hand, it is free software and not difficult to check the source to see as the nose in the middle of the face, what is done. When reusing a component with missing specifications and lacking documentation, checking the source of the implementation should be standard procedure, but it has clearly not been done by Apache or DHCP developers.
I think the shell is specified in POSIX/SUS. Checking the source of all possible open-source shells would be a huge job. I don't know how they should check source code of the closed-source shells. I don't blame them for using the environment variables according to available documentation.
I agree. This is an interesting idea, so I upvoted the paste. But I don't think this author knows how deeply the bug runs, either; the most recent way to exploit it is to export an environment variable of, say, ls to a bash function. [1]
Usually the amount of toxic environment variables are considered to be finite; PATH, LD_PRELOAD, etc., etc. If the name of any executable on the PATH is dangerous, than the number of toxic environment variables is infinite -- are we to scan the entire PATH for each environment variable to make sure it isn't dangerous? What if the CGI script updates PATH?
There is no way to solve this problem with sanity checks. I've yet to peek at the source, but I'm told this feature is vital to implementing things like backtick operators. I think it is too dangerous however, and I don't want shellshock to become a class of bug rather than an instance of toxic environment variables. We're going to have to rip this feature out and re-implement large portions of functionality.
The author is right that this is a product of bash being written in a more trusting time. This is not the first nor the last time the 1970s security models will come back to bite us.
Also, Apache does have a mechanism to filter out toxic environment variables; headers are added as HTTP_HEADER_NAME, because its generally the names of environment variables that allows them to be dangerous and not their content. Executing code as a result of parsing the value of an environment variable with no special meaning is a vulnerability.
> But I don't think this author knows how deeply the bug runs, either; the most recent way to exploit it is to export an environment variable of, say, ls to a bash function.
If you can set arbitrary environment variables, you're pwned and have always been pwned. You can set all manner of interesting things, including LD_PRELOAD, to control the execution environment and potentially execute arbitrary code.
EDIT: Putting random data in an environment variable where you pick the name should always be secure, though, which is an assumption that most of *nix makes.
But the problem with Shellshock isn't random environment variables. It's random environment variable VALUES in well-defined environment variable names. It's pretty well-known that there are certain dangerous environment variables (like PATH, LD_PRELOAD) that should not be blindly set. But CGI only sets CGI environment variables like PATH_INFO, as well as HTTP_. That even these can be dangerous because bash executes code on any* environment variable, is completely unexpected.
Is this really a loose typing issue: we give Bash data that should be of type "display text" (a sub-type of string I suppose) and it treats that data as type "executable command" (also a sub-type of string).
Would it be possible to wrap|tag input to bash so that only when a program|script sets the env variable with string that's typed as "executable" does bash even think of exec-ing it. I guess that removes some of the hack-ability and would need major rewriting of bash.
I'm a layman trying to do CS ... what could possibly go wrong!
The issue is that the environment isn't a bash-specific thing. Anything can and regularly does set environment variables, and there's no space in there to set a flag for "this is executable" - if it's in the value, anything can set that flag, and the problem here is triggered by programs setting environment variables from external data.
Plenty of other shells support backticks without the "export -f" magic. They must, as backticks are mandated POSIX behavior; few support "export -f" at all. (And at least one that did, the old Bell Labs post-v7 "Research Unix" shell, used only environment variables with embedded characters which couldn't easily be created by normal means, to avoid the risk of "magic processing" on things like TERM and HTTP_FOO.)
You will get a parse error. There is little more than [a-zA-Z0-9_] you can use in identifiers (except bash adds a few more, grrrr). You can probably pull it of with /usr/bin/env though.
"the most recent way to exploit it is to export an environment variable of, say, ls to a bash function."
Even before the redhat patch you would need something to set echo=() { ... but how will an attacker do that when they can only set something like HTTP_USER_AGENT=() { ... ? See how overriding a builtin is not and never was a vulnerability?
There are two things to differentiate, in my oppinion.
In most cases, the shell is just used to find programs in the PATH when a C programmer uses system(). And for that case, which is probably 99% of the time when /bin/sh is being invoked, it would make perfect sense to implement this with something that exhibits less attack surface.
Taking the "dhcp-exploit" as an example (set a DHCP option on your server to "(){...}; exploit;"), I think it's less clear: Implementing the functionality of updating configuration files according to the DHCP options sent is a prefecty reasonable place to use a script written in sh/ksh/bash! It's easy to implement by any sysadmin, works very reliably with a little care, and performance-wise it's not critical at all.
And regardless of the language you implement it: There's some place where user-input has to be sanitized, but up to now, it was considered common knowledge that arbitrary data in an environment variable is safe as long as the variables' name adheres to some convention (prefix them all with PROGNAME_...). And bash doesn't respect this convention by looking at variable CONTENT, even though I'm pretty sure that it was already established when the bash-project started... (see, for example, handling of "special" variables like LD_xxx in suid programs or the dynamic linker)
I said it in another thread but this is almost always a mistake. The execve family is much less ambiguous about what gets passed to the program. Using it avoids this type of bug by not putting the shell where it doesn't need to be.
And it's not limited to C. E.g. I would be in favor to remove os.system from Python (in favor of subprocess.call). The `-syntax (backtick-syntax) in Ruby is particularly evil. It's so convenient because it is so concise, but I guarantee you that it is the source of a lot of vulnerabilities. It should be removed ASAP. I think that's kind of a theme in Ruby: is it convenient? Then put it in. But I would have expected more from Python.
Vulnerable to what? The the environment variable problem? I was talking about program argument parsing. os.system("ls %s" % foo) != subrocess.call(["ls",foo])
> And for that case, which is probably 99% of the time when /bin/sh is being invoked, it would make perfect sense to implement this with something that exhibits less attack surface.
I did. I did not find it explicit enough. There was no specific recommendation, for example. Moreover seeing the phrase "when a C programmer uses system()" is pretty jarring. There aren't enough warnings you can add to that to convey how much this gets misused and what a bad idea it usually is.
To me, use of system() is very indicative that you need to find another C programmer. There are few other answers to complete the phrase "when a C programmer uses system()".
Well... that's a pretty drastic reasoning, leaving aside all weighting of facts. Does it also apply to a Haskell programmer running System.Process? ;-)
The fact is: system() and all it's relatives (popen comes immediately to mind, there are doubtlessly 100 others) have been used, will be used, by 'incompetent' programmers[+] and as long as no other method is as widely established (and: even taught in introductory textbooks), we better provide a workaround that closes most of the holes.
[+] or just programmers weighting the merits of having a parser supporting variable and home-directory expansion, curtesy of /bin/sh -c right built in, which is completely adequate for many tasks. And yes, I know the limitations of it, and would not use it myself most of the time.
How is it different to set some environment variables and then call out to a shell script, versus to set some environment variables and then call out to a perl script, or a binary compiled from C?
It's not. It's the "calling out" part that is wrong.
You should never call out to anything by passing untrusted user input directly. Any information that came from the outside must be explicitly passed as data through proper serialization mechanisms.
For instance, you don't piece your SQL queries by concatenating strings. You use an abstraction layers, in which you code the query structure and you pass user input as data. There is this extra step of saying "this is data, not code" that strips the external input from executability.
(for the same reasons, if your templating engine is just concatenating strings and not building the page out of trees, you're doing it wrong, but it's a topic for another day)
It's a problem you get when you believe in "the Unix way" a bit too much. Yes, everything is text, but no, not everything has the same semantics.
> You should never call out to anything by passing untrusted user input directly.
So if I call a CGI script with parameters foo=bar, what data should apache pass to the handler, if not something along the lines of the string "foo=bar"? When I pass the header "User-Agent: baz" and the handler asks for the user-agent, what should it be told if not "baz"?
Environment variables are data, not code. When apache executes a cgi script, whether it's C or perl or shell, it makes the user input available as data in defined locations.
There's a bug in bash which causes some of that data to be executed, but there's no way to protect against that class of bug.
This isn't a case of "you should have protected against sql injection attacks". It's a case of: there is a bug in your sql server, such that the query "select from Users where username='rm -rf /'" will execute "rm -rf /"*.
The point of the OP is that if a program has chosen bash to be handler of untrusted user data, then the program has made the wrong choice, because bash is clearly (hindsight!, I'm not claiming I wouldn't have made the same choice) not designed or that purpose. A handler for untrusted user data should be a program specifically designed for that purpose, which should receive the data directly.
Similarly, if a Ruby or Perl script decides to call out to bash with untrusted user data, it's their mistake to trust bash with it, not bash's mistake that it wasn't designed for that use case.
It's perfectly possible to protect against this attack: don't call a generic program with untrusted user data.
So ruby and perl are specifically designed to be a handler of untrusted data?
How do I know what other programs are designed for such a task? What's a "generic program"? At this day and age, it is expected that pretty much all software ought to be designed with security in mind (not that it always is). Because any piece of "generic software" (or just software) is otherwise going to be exploited. Especially on platform where double-clicking a file is the expected way to open it.
More importantly, the point we are making is that we're not expecting bash to "handle" anything. It gets some data. It's not supposed to do anything with it on its own. Period.
Yeah, I hope I never have occasion to walk through an undocumented minefield, I mean collection of "features", designed by this person.
I say this without animosity to bash devs. I think some blame can be shared. But putting it all on people you expect to understand under-documented behavior and "implementation details" in every possible version of every possible flavor of /bin/sh is madness.
On some systems, they are one and the same. /bin/sh is often symlinked to /bin/bash, which is making this so exploitable. /bin/sh is invoked by system(), popen(), etc., and referenced in script "shebangs" (#!/bin/sh at top), so I meant that nobody necessarily knows what "flavor" of /bin/sh they're going to get.
There are other methods of IPC other than shell variables. The shell is a known insecure environment, which is where there are limits on setuid for shell scripts.
By letting everyone on the Internet set shall variables Apache and whatever DHCPd (ISC?) did something they could have known would have bad consequences whether this feature/bug existed or not.
From what I understand, Apache doesn't send them to bash. It sends the to whatever binary is configured to handle the request (using CGI), which were then calling bash unbeknown to Apache (but implicitly passing the same environment variables).
Lots of functions to start another process start a shell instead and is a command line to be executed, e.g. system or popen. The convenience in that case is that you don't need your own handling of $PATH or wildcards or argument parsing. It's pretty standard on UNIXoid systems.
I wish for a standard #once directive. It should be very simple to implement, increase preprocessing speed and reduce the size of visually disturbing boilerplate in header files.
A combined REPL and editor for C# looks useful. I will try it next time I have some spare time.
The name is confusing. A C shell already exists (http://en.wikipedia.org/wiki/C_shell). An alternative name could be CSharpShell. Btw, while googling that name, I found CsharpRepl, which seems to be a somewhat similar tool.
These tools straddle some Venn circles, so the developer pitch can be confusing.
For example, I make use of both ipython and bpython, both of which I refer to as either shells or REPLs; though neither program is a proper shell (in the /bin/chsh sense), and though people also call them “interpreters” (technically, the python interpreter still interprets), “environments” (vague) or even “IDEs” (wat?) – the concept behind the tools is popular and well-understood.
Personally I think it’s particularly funny to call these Enhanced REPL Shell Interpreter Environments (or what have you) “IDEs” and lump them in with Eclipse or Visual Studio or those other behemoth coding tools; I like bpython and ipython for the myriad ways they are un-Eclipse-y, and if I did C# I would presumably get into CSharp for the same reasons. All of which, like many bicycle-shed innovations, are mere matters of taste.
> Most C++ code uses switch frequently, usually without taking advantage of fallthrough.
I am not so sure about this. Thinking back on my uses of switch in C and C++, I am not able to remember using switch without taking advantage of fallthrough. Maybe I am just not a typical C++ programmer...
There's two different kinds of fallthrough in C++. The most common is using the same code for multiple values. Rust already supports this by allowing multiple values and ranges of values for each pattern.
The much more rare use of fallthrough is executing code for one case, and then continuing on to execute the code for the next case. This seems to be much more rare. In fact, in my large Android application, I turned on warnings for this type of fallthrough, and out of hundreds of switch statements, only eight used it, and all but one was a mistake.
So his political view made him unsuitable for a job. I am really surprised about this from a company and a foundation I associated with openness and concern about freedom.
You could add the option to filter by programming language, to avoid seeing repositories using languages the potential contributor doesn't know or has no interest in.
> "What happens is that variable i is converted to unsigned integer." No: 'long i' is converted to 'unsigned long'.
Actually, unsigned long is an unsigned integer. He didn't write unsigned int.
> "Usually size_t corresponds with long of given architecture." No: For example, on Win64 size_t is 64 bits whereas long is 32 bits.
"Usually" is the keyword here. He could have said "Usually size_t has at least the same amount of bits as long" and it would be better related to the referred rule.