There's all this stuff but I remember when I was a Junior freelancer I was analysing a calendar availability sync script for a small holiday bookings company (not the big one). The hosts would have a publicly accessible Google Calendar with their bookings on which the script I was fixing would pull from.
Turns out, most of the host stored their customers long cards + expiry etc in the comment field of the booking.
Also it's really really good. Scarily good tbh. It's making PRs that work and aren't slop-filled and it figures out problems and traces through things in a way a competent engineer would rather than just fucking about.
If you haven't read the Rust Book at least, which is effectively Rust 101, you should not be writing Rust professionally. It has a chapter explaining all of this.
> In production-quality code, most Rustaceans choose expect rather than unwrap and give more context about why the operation is expected to always succeed. That way, if your assumptions are ever proven wrong, you have more information to use in debugging.
I didn't read anything in that section about unwrap/expect that it shouldn't be used in production code. If anything I read it as perfectly acceptable.
A billion alerts in DD/Sentry/whatever saying the exact problem that coincide with the exact graph of failures would probably be helpful if someone looked at them.
The unwrap should be replaced by code that creates enough alerting to make a P0 incidident from their canary deployment immediately.
OR even, the bot code crashing should itself be generating alerts.
Canary deployment would be automatically rolled back until P0 incident resolved.
All of this could probably have happened and contained at their scale in less than a minute as they would likely generate enough "omg the proxy cannot handle its config" alerts off of a deployment of 0.001% near immediately.
Agreed - a big question why the file wasn’t test driven in staging and progressively rolled out. And also what alerting was missing within FL2 that they couldn’t pinpoint the unwrap instantly.
I would say that whilst this is a good top down view, that `.unwrap()` should have been caught at code-review and not allowed. Clippy rule could have saved a lot of money.
That and why the hell wasn't their alerting showing up colossal amount of panics in their bot manager thing?
Yes the lack of observability is really the disturbing bit here. You have panics in a bunch of your core infrastructure, you would expect there to be a big red banner on the dashboard that people look at when they first start troubleshooting an incident.
This is also a pretty good example why having stack traces by default is great. That error could have been immediately understood just from a stack trace and a basic exception message.
reply