Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The death of if/else cascades, a lightweight alternative (leftrightfold.com)
85 points by kyleburton on Dec 6, 2009 | hide | past | favorite | 51 comments


This is much better handled in languages with pattern matching. And you can match on arbitrary fields.

Erlang:

  surf(Channels) ->
    lists:foreach(fun process/1, Channels).
  
  process(#channel{genre = "football"} = Channel) ->
    record(Channel);
  
  process(#channel{genre = "comedy", repeat = false} = Channel) ->
    record(Channel);
  
  process(#channel{genre = "crime", show_title = "Cops"}) ->
    skip;
  
  process(#channel{genre = "crime"} = Channel) ->
    record(Channel);
  
  process(_) ->
    skip.
Full gist: http://gist.github.com/250304


the problem with pattern matching is that its hard to dynamically change the dispatch table, in javascript I can just do

  dispatch_table[newthing].handler = function() .....


You can change the clause database in Prolog by asserta/assertz/retract, but that runs headfirst into the typical trade-offs involving mutable state, as does your Javascript example. Erlang also lets you change patterns by hot-loading code, but its design goes to great pains to do so with reasonable safety.

Also, pattern matching doesn't require all of the variables to be bound. A pattern may specify some values but just partially specify (a list with at least one remaining value) or constrain (any positive integer) others. This makes it vastly more flexible than just dispatching based on one value.


certainly, I am an erlang programmer and do love pattern matching, but as much as I pretend to hate javascript, I do love that modules/namespaces whatever are a first class citizen in a way they certainly arent with erlang, (they are at the vm level, but not so much the language, smerl is quite handy for that).

I should probably play with match specs more often


Which, if we were to take this example seriously, would be very important, since these rules would likely be user specified.


How should a function be "user" specified? (But which user?) The set of eligible options is well defined at the time of writing the program.


But the way they are composed, and where and when they apply, is most likely not!


A pattern matching solution is much more readable and it forces you to build small functions if you are pattern matching in the function declaration.


Right, AND being able to match on multiple fields and destructure them means that several awkward "which class owns the method" problems never arise in the first place.

(Multimethods are similar in many ways, though I find pattern matching far more useful.)


This is a common practice in Python and most of the implementations I have seen (and used) use it in a following way:

    handler = {
        'football': handle_football,
        'comedy': handle_comedy,
        'crime': hendle_crime
    }.get

    handler('crime')(...)


I use this all the time in C++.

  std::map<std::string, boost::function<void (int)>> handlerMap;

  boost::assign::insert(handlerMap) 
  ("football", &handleFootball)
  ("comedy", &handleComedy)
  ("crime", &handleCrime);

  handlerMap["football"](channelNumber);
Makes the code so much easier to understand. And when the handler functions start getting too complex, you can just add another level:

  handlerMap["comedy"]["situational_comedy"](channelNumber);


As a dynamic language, Python already has similar tables built in. You can use something like:

  getattr(self, 'handle_%s' % action)


If action comes from outside the program, this can open up unexpected behavior. The dispatch table is closed - there's less chance that user input will invoke something unintended.


In a real world program you would validate input and make sure you get something callable (with either technique). The only additional unexpected behavior is when there could be unexpected handle_xxx attributes. If we take for granted that the programmer defines the dict for the explicit dispatch table, though, we can also assume that the programmer defines the object for the implicit table.


IMO as code gets modified by other developers, the table is easier to keep 'safe' since it is clearer that is what it's for, the sprintf and all the other methods don't tend to be in one place - as methods get added, the reflection issue is oft overlooked.


I feel the version from the original comment is most Pythonic. The other version is close to an eval-function, which seems to be frowned up on in Python.


The second version is slower too. Here's what I use for e.g. a state machine, which follows "don't repeat yourself" a little better than the first example:

  # {state_name: state_action}
  states = {}
  
  state = ['start']
  
  def reg(func):
    states[func.__name__] = func
    return func
  
  # define and simultaneously register the states
  @reg
  def start():
    state[0] = 'second'
  
  @reg
  def second():
    state[0] = 'last'
  
  @reg
  def last():
    state[0] = None
  
  
  # run the state machine
  while state[0]:
    print state[0]
    states[state[0]]()


Is the second version slower because the string won't get intern-ed?


Yeah, and the string must be reinterpolated every time (possibly barring some of the more exotic python implementations).


Why are you using mutation?


It's especially common since Python purposefully doesn't have a fake switch/case statement -- if it can't be turned into a jumptable at compile-time, it ain't real!

The inane lisper bon mot is stated backwards: it should be "Data isn't code"


Forgive my ignorance but how is this different from a switch statement? Besides the slightly different syntax, of course. Many modern languages interpret switch statements as lookup tables. And that in compiled languages the lookup table is generated at compile time and this approach generates the lookup table at runtime, possibly wasting expensive CPU cycles.

Basically we're dealing here with a special set of condition-action pairs. The property that makes them special is that the conditions are mutually exclusive, that is, the order of condition checks is not important. Of course, most of the time, non-exclusive conditions can be restructured to exclusive conditions as long as the conditions checks themselves don't have side effects (that it, they don't modify the state). This is a solved problem in software construction. For binary conditions, you use if statements (or the equivalent in Your Favourite Language), for simple multiple conditions, you use a switch statement and for complex state-condition check you create a class/closure that takes the state and determines if the condition evaluates to true and then another function executes the action and then you loop through the condition checkers. We're talking about concepts that have been around for almost half a century. There's no need to reinvent the wheel. Instead, read some books and learn how developers solved these basic software construction problems in the past.

The problem is, many developers are too smart for their own good. They come up with new "design patterns" while ignoring the mountains of experience and knowledge past programmers have built up. More often than not these "groundshaking new ideas" lead to huge pains one way or other later down the codebase lifecycle. This is why C++ is so hard; you can come up with many "innovative" ways to do stuff only to discover a year later that your "innovation" made your codebase completely unmaintainable. Don't get me wrong, innovation is really cool and I support it but it doesn't hurt to consult the existing knowledge (either by research or asking more experienced developers) before going down your path.


Dispatch tables give you some additional flexibility. For example, say that you want to wrap each function with some statistic-gathering in debug mode. You could implement this easily with:

  if(DEBUG) {
    for(var key in actions) {
      actions[key] = wrapDebug(key, actions[key]);
    }
  }
In a switch or nested if-else, you would need to include that debug processing inline with each alternative, instead of factoring it out into its own closure-generating function.

Similarly, compilers often use this approach for instruction generation. The set of available machine instructions is stored as a table; if you move to a different architecture, you just replace the table with one appropriate to the new architecture.

This is not a new idea. SICP has a whole section on data-directed programming, and MapReduce and many other internal bits of Google infrastructure use this extensively.


Compilers are special because their input/output data is code and instructions. However the code of the compiler itself is clearly separated from the code the compiler emits. When the compiler uses the lookup table to look up the code to emit it really looks up data for output and not code to be executed (right away).


Another cool trick is runtime rebinding.

Context sensitive actions come to mind. s means save when editing, or s means stop when running. This is also a common way to let users remap keys, or build their own menus.


Eh, this isn't new and innovative. For more information, check out SICP: http://mitpress.mit.edu/sicp/full-text/book/book-Z-H-17.html...

Also, since the logic is stored in a lookup table, these can always be changed on the fly.


That's exactly what I was talking about. That's why I put these word within quotation marks.

Maybe it's just me but I don't really like the idea of changing code on the fly. It (a) makes code a lot harder to prove for correctness (b) a lot harder to debug. There are cases when it is a good idea in some very specific algorithms but in general development, I think self modifying code is a big no-no.


What about cases where you basically want the user to specify the condition at runtime, with what amounts to a Turing-complete language for writing those conditions? In the Tivo example, pretend you wanted users to be able to go up to the device and say "only record shows where one of the lexicographical permutations of the length in minutes is a prime number." The only way to avoid it is not to avoid it: to create a user-extension language with its own compiler, and a VM embedded into the program.


Oh yes allowing scripting is often an excellent feature, but I wouldn't consider a scripting interface "changing code at runtime". The boundaries of the application and the scripting environment are clearly defined most of the time. Scripting is more like receiving automated control input.

Of course the JIT of the scripting language might have self modifying parts in it but that's not the point.


I think you are extrapolating too far from a simple example and losing sight of the non-essential complexity that a switch and if-statement based approach creates. I've been there; it doesn't work very well, IMHO.


When did CPU cycles become expensive? Most available CPU cycles are wasted by the processor being idle.

I know these numbers come from VM people who are talking their book but the oft quoted number even for servers is 5% CPU utilization.

Due to the scoping rules in most imperative languages a look up table implemented in this manner will generally reduce the cyclomatic complexity of a method which is a major source of bugs, also you see this pattern implemented in most functional languages as a match statement.

Most statements about performance are by people talking their own book.

As for how this is different than a switch statement, you are failing to realize the importance of higher level concepts, do you use only NAND statements in your logic? (or the gate of your choice capable of implementing the other gates)


I've said "potentially expensive". Surely you must agree that while most of the time CPU cycles are indeed cheap in some applications (e.g. embedded systems or performance-critical loops) performance counts.

How is this any higher level than a switch statement? The syntax is slightly different but how exactly it hides complexity with abstraction?


I must be turning into a cranky oldpants programmer.

As I see it, this eliminates a large block of logic, but it doesn't actually make the program less complex.

That is, the program is still making the same decisions, and you still have the same amount of fragmented business logic handling all of your special cases; it's just no longer dropping into those special cases by way of a monolithic if/else statement.

On the other hand, this is potentially adding a dangerous bug in the program; if your dispatch table gets corrupted for some reason, or an OBO gets introduced (that never happens!), you could end up trying to jump to a random pointer address.

EDIT: Higher level languages, like Javascript, will handle it somewhat more gracefully of course. But, it would still cause an error of some kind.


Cranky pants. :)

How would the dispatch table get corrupted? If the dispatch table is compiled down into an executable in exactly the same manner an if/else would be, then I can't think of a reasonable corruption vector that would corrupt a compiled-in dispatch table, but not a similarly compiled-in if/else block of code.

As for an off-by-one, that /never/ happens in an if-else block either, and as long as you do bounds checking, you might call the wrong function, but that's not a 'random pointer address'.

You have a very good point that this technique doesn't actually lessen program complexity. It does, however, help the fragmented business logic into tables that are more easily dealt with when the business end changes.

The 'error of some kind' is just as much an error as a missing 'else' in a long if/else if/else if block or a 'default' section in a switch.


> How would the dispatch table get corrupted? If the dispatch table is compiled down into an executable in exactly the same manner an if/else would be, then I can't think of a reasonable corruption vector that would corrupt a compiled-in dispatch table, but not a similarly compiled-in if/else block of code.

Without doing any actual experiments and assuming native code, the dispatch table is liable to go into the .data section of the binary with the rest of the globals/statics, which should be read/write (if you're lucky, you made it const and it's in .rdata which is read-only). An unmitigated buffer overflow in a different item in .data is liable to corrupt the table. Typically, the .text section (the program code) is not writable.


Two points.

First, there's the distinction between essential complexity and non-essential complexity. What the OP is talking about is trying to reduce the non-essential complexity by using a more appropriate abstraction. Nested if-then-else can introduce non-essential complexity merely by having dead nested cases which are actually obviated by outer cases; looking at a complex version of this code, it can become quite difficult to see where exactly the code is supposed to flow under what circumstances, as there can seem to be conflicting assumptions in different areas.

Secondly, once upon a time I invented a scheme to solve this problem in the best way I thought possible, and called it "matrix inheritance". The problem with inheritance and subclassing is that it only handles a single dimension of customization. Suppose you have two dimensions, genre and type, such as [comedy, drama] and [movie, series]. If you were to try and classify any given thing under a classical type breakdown, you could subclass by one axes or the other, but you would need to duplicate subclassing for the remaining axes. So, you could end up with Programme, Comedy <: Programme, Drama <: Programme, but then you'd need ComedyMovie, ComedySeries, DramaMovie, DramaSeries, duplicating the kind axis in the two different branches.

The matrix inheritance concept basically takes the cartesian product of all option axes, essentially modelling it as an n-dimensional space, and then applies behaviour to sub-spaces of this space. So, you could apply conditions to [drama, ] and [, series], with these two representing slices of the example 2-dimensional space described above. The advantage of modelling things this way is that it is declarative: you can analyse overlaps and identify un-covered space.


Hehe, I implemented a scheme quite similar to your "matrix inheritance" in my (as of yet unreleased) web framework.

I have a "page variant" system where each page has a "selection" among the "choices" of its "dimensions".

For example, if a page is available in English and French on the Language dimension and for Public, Member and Admin on the Privileges dimension, then that's 6 variants. These variants get compiled separately and are generated from 1 description of the page that includes conditionals that test what variant I'm generating/compiling.

I plan to make much heavier use of this feature in the future. For example, I'd want to have a new Proficiency dimension where I'd have a version for newbies with lots of inline help and hand-holding, a regular version for average users and maybe an expert version for people with advanced knowledge and needs.

And maybe another dimension which selects whether the page should be displayed with or without "teasers", for example if other features are available on that page with a premium account, the user might want to know what he's missing OR he might want to use the free version and not be bothered with upsells for the moment.

Since all these versions are generated from 1 template, drift is very minimal. I don't want to imagine what it would be like to maintain all these separate versions without this system, though I guess I just wouldn't bother with my more advanced ideas...

(Does this feature exist in other web-frameworks? I'm a bit "in my own bubble" ;P)


Have you seen Subtext? It's an interesting semi-visual language from Jonathan Edwards at MIT.

[1] http://en.wikipedia.org/wiki/Subtext_programming_language [2] http://subtextual.org/subtext2.html (demo)

These days the idea is developed under the name Coherence. Jonathan recently released another paper on the subject at OOPSLA.

[3] http://coherence-lang.org/


Thank you for posting this! I'd seen this system a while ago and have been wishing I'd bookmarked it ever since!


Sounds like you could use multiple inheritance / mixins.


You have a configuration space, and you have a set of behaviour you want to associate with different configurations. Inheritance bundles the behaviour into something (a class) and lets you combine and override as necessary. But multiple inheritance doesn't really make your life easier - you still need to work out the order of precedence, which ancestor (trait) takes effect, etc. All this is essential complexity buried behind an ontological classification scheme that wasn't really designed for it, adding in inessential complexity.

Using a tool adapted to the job really can make life easier though.


Schematic tables [http://subtextual.org/OOPSLA07.pdf] are an extension of this idea that can handle arbitrarily complex conditionals. Unfortunately, they only work in Subtext [http://subtextual.org/], an unreleased research language. Schematic tables take advantage of the 2-dimensional visual layout of Subtext (as opposed to linear text, like most languages); it would be very interesting to see something similar in text.


> New behavior can be injected without changing any core code.

While it probably is easier to change the dispatching part of this code now, I'd hesitate to call the dispatching part the "core" code; it seems like the functions that actually do the work have a better claim to that, so what this does is to move the core code away from the calling location, and removes the name, so the core code has to be found by tracing or simulation, rather than by reading. Of course, a comment pointing at the file and line of the actions table would be nice, but might get out of date, and since the table is passed rather than global, at some point there's going to be more than one actions table that do other things as well...

This code will be a nightmare to debug when there's a few thousand lines of similar redirections. :/


> This code will be a nightmare to debug when there's a few thousand lines of similar redirections. :/

It is. The SConstruct build tool is a prime example of this technique being used so much that it hampers debugging.

Using this pattern successfully requires hard discipline; you have to concentrate to keep your code readable.


This technique is ancient for anyone that has an associative array and function objects.


The design patter of one is the syntax of another.


Erlangs pattern matching is a superior form of this concept. Clojures multimethods also implement a similar style to erlangs pattern matching dispatch. Both of those avoid some of the objections described in other comments here. The code is still in core and is easier to read than a dispatch table.


Instead of mimicing method dispatch, shouldn't this be handled by a factory that returns an appropriate action object?

Okay, the author writes:

> Skeptics may say, that problem has been solved in the Object Oriented world by [...] Wow, that is a lot of work, and think about the number of classes and lines of code you will end up with!

First, you don't have classes in a prototype-based language. Secondly, do it right or don't do it at all.


This looks like a functional (The paradigm, not the adjective) version of the Visitor pattern: http://en.wikipedia.org/wiki/Visitor_pattern


So this strikes me as a fairly common problem. If I was going to solve it I don't think i would use a "dispatch table" I wold prefer to use polymorphism. In C# maybe i would have an interface with the common method. An abstract class to provide a default implementation. Then several classes which either implement the interface directory, or inherit the abstract class.


if/else cascades will live as long as long as junior developers themselves. Jeez some of the code I've personally written when I was younger...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: