Hacker Newsnew | past | comments | ask | show | jobs | submit | Imbue's commentslogin

Why is a proper MAC preferred over simply encrypting the plaintext after appending a checksum or hash.


Among other things, because the system you just described is vulnerable to chosen ciphertext attacks.


But if someone chooses a ciphertext (without knowing the key), how could they get the checksum right (which is also encrypted)? If the checksum was wrong, the system would refuse to decrypt the corrupted message.


No. There's no such thing as "refusing to decrypt a corrupted message"; cipher cores either produce an expected plaintext or, if the message is corrupted, an unexpected plaintext. Attackers can use changes in behavior based on the differences between different unexpected resulting plaintexts to infer the original plaintext.


My question was why is a proper MAC better than appending a checksum to the plain text and then encrypting. With an appended and encrypted checksum, the system could easily reject corrupted messages, that's the whole point of a checksum in the first place.

So why is a proper MAC better than appending a checksum or hash to the plaintext and then encrypting? Or maybe I am misunderstanding something?


I've answered that already. The MAC allows the cryptosystem to reject messages with corrupted ciphertexts. If you don't do that, it can be possible to use controlled corrupted ciphertexts to learn things about the plaintexts of messages, and not just via CBC padding oracles.


Doesn't encrypt then mac allow you to reject messages prior to decryption?


Yes. That's not what we're talking about here, though.


how would the plain texts differ?


His reasoning is actually quite flawed. Even if you believe there is somehow a black and white $7.7 trillion value difference between candidates, betting on such long odds still isn't rational.

If we assume (using numbers from the article) that a vote costs $20 (an hour of our time), the odds are 1 to 10 million, and the payoff is $7.7 trillion (even though it's the country getting the benefit, not the voter) then the Kelly criterion* implies that it's only rational to vote if your net worth is more than $200 million. Somebody making $20 an hour probably isn't worth $200 million, and so even if voting did have a positive expected value, it's not a rational strategy.

TL,DR: There is more to betting strategy than expected valve, and no rational gambler takes a bet with 1/10 million odds for $20.

*http://en.wikipedia.org/wiki/Kelly_criterion


Is Fexl open-source, or even available for download? It looks pretty neat.


Funny, I was just thinking about that very thing when I noticed your post. Problem is, I'm a little "ashamed" to release the code right now because I'm in the middle of some major revisions:

1. Use DeBruin notation for referring to bound lambda values positionally, instead of my current approach of actually using the lambda symbol in an associative list. (Don't laugh, it works.)

2. Change the C implementation so it just uses an array of longs for the whole working space, with integer offsets, instead of node structs with pointers. At that point the whole implementation will be "nothin' but int", and avoid malloc altogether.

3. Change both the Perl and C implementations so that I supply external function definitions as function pointers. Right now when a high-order function is reduced to a normal form (lambda form), I actually look at the name of the external function being called and do a big if-else on that. (Don't laugh, it works.) This will no longer be an option when I go with DeBruin notation. Besides, the function pointer approach will be a lot more solid, serious, and extensible.

Thanks for expressing an interest. When I'm a little more proud of my code and have something available for download, I'll announce it on HN.


This is probably a stupid question. If you have Fexl written in Perl now, what are the chances of just throwing a sand-boxed interpreter online for people to play with? I'm thinking just a form with an input field and a submit button which runs the script and displays any printed values.


It's a brilliant question, and it's now on my TODO list. I'll aim to get this done by the end of August.

You're right about the sandboxing too -- the Fexl interpreter runs with two limits: the maximum number of cycles, and the maximum amount of memory. If it reaches either of those limits, it halts. So it should be perfectly safe to run it on the public web.


Update: It might not be ready by the end of August. I just returned from a week-long trip, and I'm now busy rewriting the whole Fexl interpreter in terms of pure combinatorics (see http://fexl.com/#combinatorics ).

The interpreter understands only the two fundamental combinators S and C. Although it is possible to define high-order combinators in the interpreter using function pointers, I am limiting myself to S and C to minimize the code size and illustrate the principle that all computable functions can be defined as applications of only S and C.

So, for example, even the Y combinator itself is defined in terms of S and C as:

  \I = (S C C)
  \Q = (S (S (C S) C) (C (S I I)))
  \Y = (S Q Q)
I'm now doing some bootstrapping, eliminating code formerly written in Perl. I'm writing the Fexl parser in Fexl, and manually converting it to S and C. I can then eliminate the parser written in Perl.


I'm not trying to rush you or anything. Just want to let you know that at least one person is still looking forward to seeing it.


Thanks, that's very good to know. I've made excellent progress on the Perl implementation, doing everything in terms of combinatorics. Much thought has gone into this.

I think before I put up the sandboxed interpreter at fexl.com, I'll just release a tar.gz file of the Perl code. That way I can release something even earlier. Also, it will give me more time to take a look at a case that causes a segmentation fault. One of my test cases is a "metastasis" function which runs forever using an increasing amount of memory. That function is:

  Y S f g
Where Y is the fixpoint function, S is the fusion function, and f and g are any functions.

I can limit the evaluation to let's say 3,000,000 cycles, so it does terminate just fine. But at that point I've built a deeply nested data structure in Perl, and when I drop it and the reference count goes to zero, Perl begins freeing that structure recursively. Well, that causes a stack overflow and hence a segmentation fault.

My workaround was to store a pointer to the big structure in a global variable. That way the structure stays live until the program terminates, at which point Perl simply abandons the structure instead of trying to free it recursively.

Of course, I easily avoid this problem in the C implementation because I free structures lazily.

Another workaround is to write a destroy routine which frees a structure non-recursively. I did try that, but the problem is that it goes all the way down, even destroying my standard structures for the Y and I combinators. So it's only something that could be used once during a program run.

In any case I'd rather not put up the sandboxed interpreter until I can get a good handle on the metastasis case. I just need to work around this one issue with Perl.

I think the code will illustrate how easy it is to augment an existing programming language such as Perl or C with a high-order functional interpreter. I'll keep you posted.


Actually there's a much simpler metastasis function, perhaps the most evil function of all, namely, the result of applying the Y combinator to the Y combinator itself:

  Y Y
If you expand this function two steps, you'll notice that it equals:

  Y Y (Y (Y Y))
And so you end up with an infinitely left-recursive thing from hell.

Fortunately my implementation of Fexl does not use built-in recursion, which would quickly throw up a segmentation fault due to stack overflow. Fexl does its own stack with a chain of nodes. Also, Fexl runs inside a completely bounded "arena", which is a fixed-size array of machine integers. All node "pointers" are actually just integer offsets into this arena. This makes it possible to increase the size of the arena by allocating a brand new one and doing a mass copy from the old to the new. But I have not bothered with this yet, because I figure that deciding on your upper bound on memory once up front is just fine for now. Note also that my approach allows for nested arenas, so for example you could have a Fexl function which allocates an arena and runs a Fexl interpreter inside there. The whole thing is completely reflective in the most profound way you could imagine, so the sky's the limit. The use of pure functions enables abstractions which are not leaky.

This leads me to my main point. I've decided to stick with the C implementation because it gives me the complete control which I need. When I want to call Fexl from Perl, I will simply fork a separate Fexl process and have Perl pump a Fexl function into its standard input. That initial function can then read any remaining input. You can do any sort of bootstrapping this way, for example the initial function may expect another Fexl function at the head of the remaining input. Any file names on the Fexl command line would simply be treated as if their contents had appeared on standard input first.

The Perl code then reads the standard output of the Fexl process, and that's its answer. On the web, it could connect the output directly to the client socket and be done with it.

One nice thing about this approach is that it's utterly safe. Your code could read the evil (Y Y) function straight off a socket from a known black hat and evaluate it with confidence, knowing that it would simply reach your upper bound on memory -- or cycles, whichever comes first -- and halt in the most ordinary way.

Another nice thing is that I don't have support an endless list of implementations in other languages, or even bindings for other languages. There is one authoritative piece of code written in C and that's that.

If you feel that this approach is too slow, then you can simply move more of your logic from Perl into the Fexl program itself. If you take this principle to its extreme, you would end up with a Perl process which does nothing except feed a program into Fexl -- at which point you would drop Perl altogether. Note that Fexl programs can be extremely compact, and you could even read auxiliary functions from files on demand.

Ultimately the entire API at https://loom.cc will be based on Fexl. Instead of doing a bunch of grungy back-and-forth API calls and doing all your branching and looping on the client side, you can just ship an entire Fexl program to the Loom server and have it run there, delivering precisely the answer you want in the format you want. You could even send back a single number such as "42" if that's all you need.

To avoid shipping the same program to Loom repeatedly, you could store the program in an Archive slot once. Then whenever you want to run that program, you only have to send the archive ID.


It sounds like you're having a lot of fun with it.

I think a C implementation would be better anyway. C is the common denominator between almost every language. Have you already done the foreign function interface to C?

When do you think we might see either source code or a live web demo to play with?

For a web demo, couldn't you just add a quick CGI interface to your C program?


The Fexl interpreter is simple, elegant, and fast. The core "reduce" routine runs about 43.5 million cycles per second on a simple infinite loop. If I set max_cycles to a billion it stops after about 23 seconds.

To test it with something more real, I wrote the "cat" program in Fexl as follows:

  \cat = (get_char \ch eq EOF ch I; put_char ch cat)
Then I pumped 30MB of data into it, which took about 42 seconds:

  time (head -c 30000000 /dev/zero | fexl_cat >/dev/null)
Of course that's a lot slower that pushing the data through the Unix cat program, which only takes about 0.14 seconds:

  time (head -c 30000000 /dev/zero | cat >/dev/null)
But think about all the gyrations happening here:

  \cat = (get_char \ch eq EOF ch I; put_char ch cat)
That is translated into the closed (variable-free) form:

  (Y (S (C get_char) (S (C (S (S (eq EOF) (C I))))
  (S (C (S put_char)) C))))
Then that runs like the wind, applying combinator rules such as these many millions of times:

  C x y = x
  S x y z = x z (y z)
  I x = x
  Y x = x (Y x)
Not to mention that get_char and put_char are handling and building characters one at a time, and the "eq" function runs as a combinator as well.

By the way it is quite trivial to add new combinators written in C. For example, the fabled Y combinator is defined in a single file "type_Y.c" as follows:

  #include "node.h"
  #include "type_Y.h"

  /*
  Fixpoint function (Y combinator):  (Y f) = (f (Y f))
  */
  static void step(void)
      { 
      int f1 = pop();
      if (!f1) return;

      set_pair(f1, R(f1), P(L(f1),R(f1)));
      } 

  int type_Y(void)
      { 
      static int node = 0;
      if (node == 0) node = new_combinator(step);
      return node;
      } 
The type_Y.h file just says:

  extern int type_Y(void);


The whole thing is about 800 lines of C code so far, and I aim to get that lower. :) For one thing, I'm busy bootstrapping the Fexl parser into Fexl itself. It's really neat to be able to express executable concepts simply and completely like this:

  # Parse a nested parenthesized expression.
  \parse_nested =
    (
    \yes
    \no
    skip_char ch_lparen
        (
        parse_expr
            (\expr skip_char ch_rparen (yes expr) no)
            no
        )
        no
    )
And it's simple to do either-or parsing like this:

  # Parse a factor:  either an id or a nested expression.
  \parse_factor =
      (
      \yes
      \no
      parse_id (\id yes (var id));
      parse_nested yes;
      no
      )

Even such mundane tasks as skipping white space and comments have a satisfying elegance about them:

  # Skip filler, including white space and comments.
  \skip_filler =
      (
      \done
      skip_space;
      skip_comment
          (skip_filler done)
          done
      )


OK, here's one more example of how to write a Fexl combinator. Back in 1924 the great Moses Schönfinkel demonstrated that all computable functions can be defined in terms of just two primitive functions: C (the constant function), and S (the fusion function). So here is the source code for S, the Mother of all functions:

  #include "node.h"
  #include "type_S.h"

  /*
  Fusion function (Verschmelzungfunktion)
  S x y z = ((x z) (y z))
  */
  static void step(void)
      {
      int f1 = pop();
      int f2 = pop();
      int f3 = pop();
      if (!f3) return;

      set_pair(f3, P(R(f1),R(f3)), P(R(f2),R(f3)));
      }

  int type_S(void)
      {
      static int node = 0;
      if (node == 0) node = new_combinator(step);
      return node;
      }


Not sure if I am understanding this correctly. By bootstrapping, do you mean C does only the most basic syntax parsing, and then a fexl script (written in a basic fexl subset) does the more complete parsing?

All this stuff is really neat. Your C code interface looks quite nice and clean. I hope you release some source code soon.

And thanks for taking the time to write all this. Even though a lot of it is over my head, it's quite interesting. I'm really looking forward to playing around with it a bit and learning.


By the way, the initial release of Fexl will take a very simple form: A Universal Filter. The "fexl" executable will be nothing more than a program which maps standard input to standard output. So it's a filter. But it's universal because it is Turing-equivalent, capable of performing any conceivable function from stdin to stdout.

When Fexl runs, the first thing it does is read a Fexl function from standard input. The Fexl process then "becomes" that function, processing the tail of the input accordingly. Simple!


That was the general idea. But it's taken quite an interesting turn lately! Turns out I can parse Fexl one character at a time and build the closed executable form in a single pass as I go along -- thanks to a concept which I discovered several months ago but shelved as a mere curiosity at the time. But I'm sure using it now!

I started a proper blog here: http://fexl.com/blog , but we can still use ycombinator.com for discussion.


I created a Fexl discussion group at http://groups.google.com/group/fexl .

You can reach everything from http://fexl.com, including the Blog and Discussion group.


I have decided not to store type pointers directly in the Fexl arena. Instead, I will use a type number, which will be an integer offset into a static array of function pointers in the C program.

That way I could use a memory-mapped array for the arena, making it suitable as a persistent state for a Fexl computation, which could easily be stopped and resumed.

This also avoids the need for a special "show" function pointer in a type structure. Instead, the show functions could be stored in a parallel array of function pointers -- again, indexed by type number.


Not only have I done foreign function interface to C, the entire system depends on it.

Every atomic item begins with a pointer to its "type". Following the type pointer is an arbitrary vector of 0 or more machine words which constitute the data for that atomic item.

The type consists of a pair of C routine pointers: a "step" routine and a "show" routine. To write a new type X, you create two C files "type_X.h" and "type_X.c". The .h file typically contains a single "extern" declaration making the type structure visible to the rest of the system. As it stands right now, you must statically link your new type into the Fexl executable. I suppose eventually I or someone else can create a type which allows access to dynamic libraries, but I'm not messing with that right now.

The step routine is called when the atomic item is used as a Fexl function. Typically it should do a small and quick bit of surgery on the stack and return fairly quickly. It should never loop or fail to terminate in a predictable time. I believe this can eventually form the basis for "threading" multiple function reductions in a single process, but for now I'm sticking with a single reduction.

The show routine is used only for introspection, allowing the display of structures for tracing purposes during development and debugging. It returns a string representation of the atomic item, in the form of a valid Fexl token which can be parsed and easily distinguished from other types of atomic items.

With this feature, one can write a Fexl function which lazily converts a raw memory structure into a functional binary tree with tokens at the leaves. Then it is easy to write a Fexl function which lazily converts that binary binary tree into a functional string, or simply prints it directly.

That way even when you're debugging you never have to write a recursive C routine to traverse the memory structures aggressively, or worry about buffering large string representations.

All that is required for general purpose computation is the core "reduce" routine, which handles function application, along with the atomic types for the S and C combinators. To enhance efficiency, I also provide atomic types for the I and Y combinators, which could otherwise be defined in terms of S and C.

The reduce routine itself is under 10 lines of C code, consisting of a "while (1)" loop with some simple logic in the middle. It terminates when the reduction reaches a normal form, or when it loops beyond a maximum number of cycles, or when the memory arena is exhausted -- whichever comes first. So it is guaranteed to terminate.

Regarding the web demo, I have to be very careful with that. I'll be exposing a universal Turing machine to the World Wide Web. Certainly I'll be sand-boxing it, embedding the client's Fexl function inside an environment which does not allow arbitrary operating system calls. Nevertheless, because this is a very new product, I'll need to do extensive verification and testing on it to make sure it's properly sealed off.

Until now I've been fairly cavalier about it, because I've just been using a half-assed Perl interpreter which only calls a few safe routines, and was only exposed to a few trusted users. This new thing is a far more powerful beast, but it's code I can be proud of.

As for the schedule, I'll have to shoot for end of September. Typically I must devote the first half of every month to getting the hedge fund accounting done. After that I have more time for side projects.

One thing that has slipped my schedule is the need to do my own memory allocation inside the arena (the master array of ints). I can't trust any outside memory allocators. I call calloc once to create that arena, but after that it's all my own code. I'm doing the trick where a node begins and ends with a length, and I have a doubly-linked free list, so I can easily coalesce adjacent small free nodes into larger ones.

All my function application nodes are the same size, but there is no such constraint on Atomic! So I have to allow for variably sized data.

Thanks for your interest! Now that I have an audience of at least one, it's fun "blogging" about it here at HN. Thanks Paul Graham!


Great! I'll be looking forward to it. I kind of have a thing for small languages.


If you're interested in dabbling a bit, here's the grammar I'm using:

Here is the basic Fexl language, which can express all computable functions.

   expr = \atom expr
   expr = factor
   expr = factor expr

   factor = atom
   factor = (expr)

 The full language includes two additional notations for convenience.

   expr = \atom=factor expr
        = (\atom expr) factor

   expr = factor; expr
        = factor (expr)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: