More

ltratt · on July 27, 2022

I always try to be respectful towards other software in my writing and I fell short this time. I hope you'll accept my sincere apologies!

> the tool chain target names, which I think was your real complaint

Yes, this was solely what I was referring to (slightly thoughtlessly) as "ugly", and even then only in the sense of "where did those magic names come from?" I certainly wasn't referring to objcopy itself!

gumby · on July 27, 2022

don't worry: I'm not actually insulted. I just think it's funny that something intended for development debugging turn out to be actually useful (as I said I still use them too, and not for debugging bfd).

mmastrac · on July 28, 2022

I've been using objcopy and objdump for a _long_ time. You saved my bacon on some really obscure archs (QCOM hexagon, for instance). Thanks!

ltratt · on April 14, 2022

I agree that you don't want non-CHERI Rust code to have to know about capabilities in any way. However, if you are using Rust for CHERI, you need to have some access to capability functions otherwise, even in pure capability mode, you can't reduce a capability's permissions. For example, if you want to write `malloc` in purecap Rust, you'll probably want to hand out capabilities that point to blocks of memory with bounds that only cover that block: you need some way to say "create a new capability which has smaller bounds than its parent capability."

As to the utility of hybrid mode, I politely disagree. For example, is it useful for safe Rust code to always pay the size penalty of capabilities when you've proven at compile-time that they can't misuse the underlying pointers? A very different example is that hybrid mode allows you to meaningfully impose sub-process like compartments (in particular with the `DDC` register, which restricts non-capability code to a subset of the virtual address space; see also the `PCC` register and friends). Personally, I think that this latter technique holds a great deal of potential.

mlindner · on April 14, 2022

> For example, is it useful for safe Rust code to always pay the size penalty of capabilities when you've proven at compile-time that they can't misuse the underlying pointers?

This seems like it's impossible though? How can you prove at compile time that all software that your safe Rust calls doesn't corrupt pointers? Don't you need capabilities in the Rust to ensure that if such software does something nefarious, the Rust code catches it before doing something untoward? (Not to mention the risk of compiler bugs causing something.)

saagarjha · on April 14, 2022

If you’re passing a pointer to safe Rust code, with the capability bound encoded into something “native” to the language, then you don’t need hardware capabilities at all.

connicpu · on April 14, 2022

You'd still need access to the capability interface to perform that checking at the unsafe/safe boundary

saagarjha · on April 15, 2022

Correct, but once you've done that you can strip the capability information and pass the raw address around to the safe code because the compiler runs validation.

djmcnab · on April 14, 2022

One potential option I haven't seen mentioned is to make references (i.e. `&[mut] T`) not use capabilities, but raw pointers (`*(mut|const)`) to use capabilities. Since the compiler already guarantees that references are used correctly, at least theoretically this is best of all worlds.

Now it's possible that CHERI would make this impossible, but it's definitely an angle worth recognising.

ltratt · on April 14, 2022

It's absolutely possible, because the hardware doesn't care about your compilation model: you can mix normal pointers and capabilities as you wish. A challenge is that it's easy to go from capability -> pointer, but harder to go from pointer -> capability -- where do the extra capability bits come from? CHERI C provides a default ("inherit capability bits from the DDC") but I'm not sure that's what I would choose to do.

roblabla · on April 14, 2022

Problem is, there's lots of unsafe code that casts *mut T to &mut T (usually after checking T is valid and whatnot). If &mut T didn't use capabilities, this kind of unsafe code would end up not taking advantage of the CHERI capability checking, which would be unfortunate.

djmcnab · on April 14, 2022

I don't think this is actually a problem, since when casting from `&mut T` to `*mut T`, the returned pointer can only access the data (the T value) directly behind the reference.

The raw pointer would be synthesised with the capability for only the pointee of the original reference.

ltratt · on Jan 11, 2021

[Warning: over-serious response coming.] There's a big "cattle grid" sign a little way before it, so I don't think that's a realistic worry. As an aside, that road is long, but a dead end: it leads up to a beauty spot / hiking point. The road also largely lacks dividing lines (i.e. it's sort-of single track, though often wide enough for two vehicles), and the cattle grid comes not long after a sharp corner, so I can't imagine many vehicles would have been able to get to flying speeds. Besides, that neck of the woods has plenty of interesting driving roads including the main A39 out of Porlock just a couple of miles away, that I imagine are of more interest to speed demons.

ltratt · on Nov 18, 2020

> Is this a separate thing that would be integrated into an IDE or would it make more sense as part of the compiler?

As things stand it's most easily used in a batch setting (e.g. a compiler). I don't think it would take much, if any, work to use in an incremental parsing (for an IDE), but I haven't tried that yet!

vlovich123 · on Nov 18, 2020

I’d be interested in helping you integrate that into an IDE if you’re open to it.

ltratt · on Nov 21, 2020

I'm not an IDE person myself (I'm a neovim person), but I'd love to see someone integrate such an approach into an IDE! The algorithm is free available and the implementation in Rust (grmtools) the same. If you wanted to reuse the Rust code, and it needs some tweaks, then I'm sure that we can find a way to support both batch and IDE-ish use cases well.

ltratt · on Oct 22, 2020

Alas, most parser generators don't have very good error recovery (and some have such terrible error recovery that I think it's worse than not having any!).

It turns out that this isn't inevitable: there's been a long strand of research on decent error recovery for LR parsers, at least, but it needed a bit of a refresh to be practical. If you'll forgive the blatant self promotion, we tackled this problem in https://soft-dev.org/pubs/html/diekmann_tratt__dont_panic/ which is implemented in our Rust parsing system https://github.com/softdevteam/grmtools/. It won't beat the very best hand-written error recovery routines, but it's often not far behind.

ltratt · on Sept 30, 2020

I also have a C920: it's fairly infamous for the "Smurf effect", where it tends to make everything look unnaturally blue. Setting the "white_balance_temperature" setting manually improves this situation substantially! Depending on your OS, you might have to first set a "white_balance_temperature_auto" to "off" and then set a "white_balance_temperature" in Kelvin. [In OpenBSD, we recently added this ability, but conflated the two controls into one. YMMV will vary by OS!]

ltratt · on Sept 25, 2020

PHP is an... interesting... language to implement. The de facto standard PHP interpreters have become very fast over time. It's been a long time since I measured HHVM, but it always used to have very good warm-up time. I always felt it was a bit of a pity that no-one took on HippyVM (PHP in RPython https://github.com/hippyvm/hippyvm): it implemented a very large chunk of the language, had decent performance, but still some low-hanging fruit to improve (I had some fun making fairly substantial performance improvements to some aspects of it). It would be interesting to compare HippyVM and Graalphp once they implement equivalent chunks of the language, especially if someone takes on maintainership of HippyVM!

ltratt · on Sept 16, 2020

> My understanding is that Wagner's thesis is mostly concerned with "history-sensitive" error recovery in an IDE: using earlier syntax trees as an additional input to the error recovery process in order to better understand the user's intent when editing.

That matches my understanding (note that Wagner's error recovery algorithm is probably best supplemented with Lukas Diekmann's thesis, which corrects and expands it a bit https://diekmann.co.uk/diekmann_phd.pdf).

Because Wagner's algorithm relies on history and context, it does sometimes do some unexpected (and not always useful things). However, I don't see any fundamental reason why it couldn't be paired with a "traditional" error recovery approach (I have one to sell!) to try and get the best of both worlds.

maxbrunsfeld · on Sept 16, 2020

Yeah, I agree. To really demonstrate the best possible IDE error recovery that LR/GLR can provide, pairing a robust batch error recovery strategy with Wagner's (and your and Lukas's) history-sensitive approach is probably the way to go.

At the time that I was implementing error recovery for Tree-sitter, that felt beyond my "complexity budget", since batch error recovery was already not trivial, and the batch approach was good enough for the features I was building on GitHub/Atom. Someday, I could imagine augmenting it with history-sensitive behavior. I am thankful that Lukas has helped to "pave that path" so thoroughly.

ltratt · on Sept 16, 2020

> For error recovery, do any parsers have a notion of which tokens are more likely to be wrong?

lrpar has a %avoid_insert declaration which allows users to say "these token types are better avoided if there are other equivalently good repair sequences". It's simple, but it works well https://softdevteam.github.io/grmtools/master/book/errorreco...

maxbrunsfeld · on Sept 16, 2020

I like this solution, and have thought about adding something similar as an "advanced feature" in Tree-sitter grammars. I'm interested to see how you've used it in grammars.

ltratt · on July 16, 2020

tree-sitter is excellent stuff! It's heavily inspired by Tim Wagner's PhD thesis (original site seems to be down, but https://web.archive.org/web/20150919164029/https://www.cs.be... works). IMHO more people should know about that work, and the sequence of work from Susan Graham's lab that led up to it. We have also been heavily inspired by Tim's work and Lukas's thesis extends and updates a number of aspects of that seminal work including, in Chapter 3, error recovery (https://diekmann.co.uk/diekmann_phd.pdf).

All that said, it's surprisingly difficult to compare error recovery in an online parser (i.e. one that's parsing as you type) to a batch parser. In the worst case (e.g. load a file with a syntax error in), online parsers have exactly the same problems as a batch parser; however, once they've built up sufficient context they have different, sometimes more powerful, options available to them (but they also need to be cautious about rewriting the tree too much as that baffles users).