Hacker Newsnew | past | comments | ask | show | jobs | submit | more ltratt's commentslogin

I always try to be respectful towards other software in my writing and I fell short this time. I hope you'll accept my sincere apologies!

> the tool chain target names, which I think was your real complaint

Yes, this was solely what I was referring to (slightly thoughtlessly) as "ugly", and even then only in the sense of "where did those magic names come from?" I certainly wasn't referring to objcopy itself!


don't worry: I'm not actually insulted. I just think it's funny that something intended for development debugging turn out to be actually useful (as I said I still use them too, and not for debugging bfd).


I've been using objcopy and objdump for a _long_ time. You saved my bacon on some really obscure archs (QCOM hexagon, for instance). Thanks!


I agree that you don't want non-CHERI Rust code to have to know about capabilities in any way. However, if you are using Rust for CHERI, you need to have some access to capability functions otherwise, even in pure capability mode, you can't reduce a capability's permissions. For example, if you want to write `malloc` in purecap Rust, you'll probably want to hand out capabilities that point to blocks of memory with bounds that only cover that block: you need some way to say "create a new capability which has smaller bounds than its parent capability."

As to the utility of hybrid mode, I politely disagree. For example, is it useful for safe Rust code to always pay the size penalty of capabilities when you've proven at compile-time that they can't misuse the underlying pointers? A very different example is that hybrid mode allows you to meaningfully impose sub-process like compartments (in particular with the `DDC` register, which restricts non-capability code to a subset of the virtual address space; see also the `PCC` register and friends). Personally, I think that this latter technique holds a great deal of potential.


> For example, is it useful for safe Rust code to always pay the size penalty of capabilities when you've proven at compile-time that they can't misuse the underlying pointers?

This seems like it's impossible though? How can you prove at compile time that all software that your safe Rust calls doesn't corrupt pointers? Don't you need capabilities in the Rust to ensure that if such software does something nefarious, the Rust code catches it before doing something untoward? (Not to mention the risk of compiler bugs causing something.)


If you’re passing a pointer to safe Rust code, with the capability bound encoded into something “native” to the language, then you don’t need hardware capabilities at all.


You'd still need access to the capability interface to perform that checking at the unsafe/safe boundary


Correct, but once you've done that you can strip the capability information and pass the raw address around to the safe code because the compiler runs validation.


One potential option I haven't seen mentioned is to make references (i.e. `&[mut] T`) not use capabilities, but raw pointers (`*(mut|const)`) to use capabilities. Since the compiler already guarantees that references are used correctly, at least theoretically this is best of all worlds.

Now it's possible that CHERI would make this impossible, but it's definitely an angle worth recognising.


It's absolutely possible, because the hardware doesn't care about your compilation model: you can mix normal pointers and capabilities as you wish. A challenge is that it's easy to go from capability -> pointer, but harder to go from pointer -> capability -- where do the extra capability bits come from? CHERI C provides a default ("inherit capability bits from the DDC") but I'm not sure that's what I would choose to do.


Problem is, there's lots of unsafe code that casts *mut T to &mut T (usually after checking T is valid and whatnot). If &mut T didn't use capabilities, this kind of unsafe code would end up not taking advantage of the CHERI capability checking, which would be unfortunate.


I don't think this is actually a problem, since when casting from `&mut T` to `*mut T`, the returned pointer can only access the data (the T value) directly behind the reference.

The raw pointer would be synthesised with the capability for only the pointee of the original reference.


[Warning: over-serious response coming.] There's a big "cattle grid" sign a little way before it, so I don't think that's a realistic worry. As an aside, that road is long, but a dead end: it leads up to a beauty spot / hiking point. The road also largely lacks dividing lines (i.e. it's sort-of single track, though often wide enough for two vehicles), and the cattle grid comes not long after a sharp corner, so I can't imagine many vehicles would have been able to get to flying speeds. Besides, that neck of the woods has plenty of interesting driving roads including the main A39 out of Porlock just a couple of miles away, that I imagine are of more interest to speed demons.


> Is this a separate thing that would be integrated into an IDE or would it make more sense as part of the compiler?

As things stand it's most easily used in a batch setting (e.g. a compiler). I don't think it would take much, if any, work to use in an incremental parsing (for an IDE), but I haven't tried that yet!


I’d be interested in helping you integrate that into an IDE if you’re open to it.


I'm not an IDE person myself (I'm a neovim person), but I'd love to see someone integrate such an approach into an IDE! The algorithm is free available and the implementation in Rust (grmtools) the same. If you wanted to reuse the Rust code, and it needs some tweaks, then I'm sure that we can find a way to support both batch and IDE-ish use cases well.


Alas, most parser generators don't have very good error recovery (and some have such terrible error recovery that I think it's worse than not having any!).

It turns out that this isn't inevitable: there's been a long strand of research on decent error recovery for LR parsers, at least, but it needed a bit of a refresh to be practical. If you'll forgive the blatant self promotion, we tackled this problem in https://soft-dev.org/pubs/html/diekmann_tratt__dont_panic/ which is implemented in our Rust parsing system https://github.com/softdevteam/grmtools/. It won't beat the very best hand-written error recovery routines, but it's often not far behind.


I also have a C920: it's fairly infamous for the "Smurf effect", where it tends to make everything look unnaturally blue. Setting the "white_balance_temperature" setting manually improves this situation substantially! Depending on your OS, you might have to first set a "white_balance_temperature_auto" to "off" and then set a "white_balance_temperature" in Kelvin. [In OpenBSD, we recently added this ability, but conflated the two controls into one. YMMV will vary by OS!]


PHP is an... interesting... language to implement. The de facto standard PHP interpreters have become very fast over time. It's been a long time since I measured HHVM, but it always used to have very good warm-up time. I always felt it was a bit of a pity that no-one took on HippyVM (PHP in RPython https://github.com/hippyvm/hippyvm): it implemented a very large chunk of the language, had decent performance, but still some low-hanging fruit to improve (I had some fun making fairly substantial performance improvements to some aspects of it). It would be interesting to compare HippyVM and Graalphp once they implement equivalent chunks of the language, especially if someone takes on maintainership of HippyVM!


> My understanding is that Wagner's thesis is mostly concerned with "history-sensitive" error recovery in an IDE: using earlier syntax trees as an additional input to the error recovery process in order to better understand the user's intent when editing.

That matches my understanding (note that Wagner's error recovery algorithm is probably best supplemented with Lukas Diekmann's thesis, which corrects and expands it a bit https://diekmann.co.uk/diekmann_phd.pdf).

Because Wagner's algorithm relies on history and context, it does sometimes do some unexpected (and not always useful things). However, I don't see any fundamental reason why it couldn't be paired with a "traditional" error recovery approach (I have one to sell!) to try and get the best of both worlds.


Yeah, I agree. To really demonstrate the best possible IDE error recovery that LR/GLR can provide, pairing a robust batch error recovery strategy with Wagner's (and your and Lukas's) history-sensitive approach is probably the way to go.

At the time that I was implementing error recovery for Tree-sitter, that felt beyond my "complexity budget", since batch error recovery was already not trivial, and the batch approach was good enough for the features I was building on GitHub/Atom. Someday, I could imagine augmenting it with history-sensitive behavior. I am thankful that Lukas has helped to "pave that path" so thoroughly.


> For error recovery, do any parsers have a notion of which tokens are more likely to be wrong?

lrpar has a %avoid_insert declaration which allows users to say "these token types are better avoided if there are other equivalently good repair sequences". It's simple, but it works well https://softdevteam.github.io/grmtools/master/book/errorreco...


I like this solution, and have thought about adding something similar as an "advanced feature" in Tree-sitter grammars. I'm interested to see how you've used it in grammars.


tree-sitter is excellent stuff! It's heavily inspired by Tim Wagner's PhD thesis (original site seems to be down, but https://web.archive.org/web/20150919164029/https://www.cs.be... works). IMHO more people should know about that work, and the sequence of work from Susan Graham's lab that led up to it. We have also been heavily inspired by Tim's work and Lukas's thesis extends and updates a number of aspects of that seminal work including, in Chapter 3, error recovery (https://diekmann.co.uk/diekmann_phd.pdf).

All that said, it's surprisingly difficult to compare error recovery in an online parser (i.e. one that's parsing as you type) to a batch parser. In the worst case (e.g. load a file with a syntax error in), online parsers have exactly the same problems as a batch parser; however, once they've built up sufficient context they have different, sometimes more powerful, options available to them (but they also need to be cautious about rewriting the tree too much as that baffles users).


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: