Hacker Newsnew | past | comments | ask | show | jobs | submit | AgentME's commentslogin

LLMs are amazing and I do seriously wonder if the singularity could happen in my lifetime ... but there definitely are people over-hyping present capabilities too much right now. If present models were fully human-level proper-extended-Turing-test-passing AGI, then the results on the economy and the software ecosystem would be as immediately obvious and world-changing as a comet impact.

I don't think Rakyll or Andrej are claiming these things; I think they're assuming their readers share more context with them and that it's not necessary to re-tread that every time they post about their surprise at AI currently being better than they expected. I've had the experience multiple times now of reading posts from people like them, nodding along, and then reading breathless quote-tweets of those very same posts exclaiming about how it means that AGI is here right now.


The explanation it gives at the start appears to be on the right track but then the post has two separate incomplete/flawed attempts at coding it. (The first one doesn't actually put the expected crypt() output in the payload, and the second one puts null bytes in the password section of the payload where they can't go.)

Automatically running LLM-written code (where the LLM might be naively picking a malicious library to use, is poisoned by malicious context from the internet, or wrongly thinks it should reconfigure the host system it's executing code on) is an increasingly popular use-case where sandboxing is important.

That scenario is harder to distinguish from the adversarial case that public hosts like Cloudflare serve. I don't think it's unreasonable to say that a project like OpenWorkers can be useful without meeting the needs of that particular use-case.

Something like "all code is run with no permissions to the filesystem or external IO by default, you have to do this to add fine-grained permissions for IO, the code is run within an unprivileged process that's sandboxed using standard APIs to defend in depth against possible v8 vulnerabilities, here's how this system protects against obvious possible attacks..." would be pretty good. Obviously it's not proof it's all implemented perfectly, but it would be a quick sign that the project is miles ahead of a naive implementation, and it would give someone interested some good pointers on what parts to start reviewing.

This is exactly where we see things heading. The trust model is shifting - code isn't written by humans you trust anymore, it's generated by models that can be poisoned, confused, or just pick the wrong library.

We're thinking about OpenWorkers less as "self-hosted Cloudflare Workers" and more as a containment layer for code you don't fully control. V8 isolates, CPU/memory limits, no filesystem access, network via controlled bindings only.

We're also exploring execution recording - capture all I/O so you can replay and audit exactly what the code did.

Production bug -> replay -> AI fix -> verified -> deployed.


This is assuming that the process could have done anything sensible while it had the malformed feature file. It might be in this case that this was one configuration file of several and maybe the program could have been built to run with some defaults when it finds this specific configuration invalid, but in the general case, if a program expects a configuration file and can't do anything without it, panicking is a normal thing to do. There's no graceful handling (beyond a nice error message) a program like Nginx could do on a syntax error in its config.

The real issue is further up the chain where the malformed feature file got created and deployed without better checks.


> panicking is a normal thing to do

I do not think that if the bot detection model inside your big web proxy has a configuration error it should panic and kill the entire proxy and take 20% of the internet with it. This is a system that should fail gracefully and it didn't.

> The real issue

Are there single "real issues" with systems this large? There are issues being created constantly (say, unwraps where there shouldn't be, assumptions about the consumers of the database schema) that only become apparent when they line up.


I don't know too much about how the feature file distribution works but in the event of failure to read a new file, wouldn't logging the failure and sticking with the previous version of the file be preferable?


That's exactly the point (ie just prior to distribution) where a simple sanity check should have been run and the config replacement/update pipeline stopped on failure. When they introduced the 200 entry limit memory optimised feature loader it should have been a no-brainer to insert that sanity check in the config production pipeline.


Or even truncating the features to their limit and alerting through logs that there is likely performance degradation in their Bot Management.

I'm really confused how so many people are finding it acceptable to bring down your entire reverse-proxy because the length of feature sets for the ML model in one of your components was longer than expected.


One feature failing like this should probably log the error and fail closed. It shouldn't take down everything else in your big proxy that sits in front of your entire business.


Yea, Rust is safe but it’s not magic. However Nginx doesn’t panic on malformed config. It exits with hopefully a helpful error code and message. The question is then could the cloudflare code have exited cleanly in a way that made recovery easier instead of just straight panicking.


Would expect with a message meet that criteria of exiting with a more helpful error message? From the postmortem it seems to me like they just didn’t know it even was panicing


> However Nginx doesn’t panic on malformed config. It exits with hopefully a helpful error code and message.

The thing I dislike most about Nginx is that if you are using it as a reverse proxy for like 20 containers and one of them is up, the whole web server will refuse to start up:

  nginx: [emerg] host not found in upstream "my-app"
Obviously making 19 sites also unavailable just because one of them is caught in a crash loop isn't ideal. There is a workaround involving specifying variables, like so (non-Kubernetes example, regular Nginx web server running in a container, talking to other containers over an internal network, like Docker Compose or Docker Swarm):

  location / {
      resolver 127.0.0.11 valid=30s; # Docker DNS
      set $proxy_server my-app;
      proxy_pass http://$proxy_server:8080/;
      proxy_redirect default;
  }
Sadly, if you try to use that approach, then you just get:

  nginx: [emerg] "proxy_redirect default" cannot be used with "proxy_pass" directive with variables
Sadly, switching the redirect configuration away from the default makes some apps go into a redirect loop and fail to load: mostly legacy ones, where Firefox shows something along the lines of "The page isn't redirecting properly". It sucks especially badly if you can't change the software that you just need to run and suddenly your whole Nginx setup is brittle. Apache2 and Caddy don't have such an issue.

That's to say that all software out there has some really annoying failure modes, even is Nginx is pretty cool otherwise.


Exactly! Sometimes exploding is simply the least bad option, and is an entirely sensible approach.


In this case it definitely wasn’t the least bad option though.


Falling back to a generic base configuration in the presence of an incoming invalid config file would probably be a sensible thing to do.


I'm tired of every other discussion about EA online assuming that SBF is representative of the average EA member, instead of being an infamous outlier.


What reasons at all do you have?


What reasons do you have to assume EA = SBF?


The reported case about water wells running dry had to do with issues in construction rather than anything about the data center's regular operation:

> But the reason their taps ran dry (which the article itself says) was entirely because of sediment buildup in groundwater from construction. It had nothing to do with the data center’s normal operations (it hadn’t begun operating yet, and doesn’t even draw from local groundwater). The residents were wronged by Meta here and deserve compensation, but this is not an example of a data center’s water demand harming a local population.

https://andymasley.substack.com/p/the-ai-water-issue-is-fake...


>Almonds are pretty cherry picked here as notorious for their high water use.

If water use was such a dire issue that we needed to start cutting down on high uses of it, then we should absolutely cherry pick the high usages of it and start there. (Or we should just apply a pigouvian tax across all water use, which will naturally affect the biggest consumers of it.)


Yes, that's roughly what I said in my post. If we're doing a controlled economy and triaging for the health of the ecosystem, we'd start with feed for cattle, and almonds wouldn't be much further down on the list.

The contention with AI water use is that something like this is currently happening as local water supplies are being diverted for data-centers.


I have >1 gbps service from them.


The future isn't evenly distributed. I recently discovered an actively developed software project that had a ton of helper functions based on the design of `gets` with the same vulnerability. Surprisingly not all C/C++ developers have learned yet to recoil in horror at seeing a buffer pointer being passed around without a length. (C++'s std::span was very convenient for fixing the issue by letting the buffer pointer and length be kept together, exactly like Go and Rust slices.)


> Surprisingly not all C/C++ developers have learned yet to recoil in horror at seeing a buffer pointer being passed around without a length.

As someone who wasn't taught better (partly due to not picking CS as a career stream), are there any languages which avoid such vulnerability issues? Does something like rust help with this?


Almost everything else, besides any language that is copy-paste compatible with C, including systems languages that predate C for a decade, like JOVIAL, ESPOL, NEWP, PL/I and other ALGOL inspired systems languages.

Xerox PARC started with BCPL for their systems, but eventually created Mesa exactly for safe systems programming.

https://en.wikipedia.org/wiki/Mesa_(programming_language)

http://toastytech.com/guis/star.html

"The Mesa Programming Environment" - very first IDE for a systems language

https://www.digibarn.com/friends/curbow/star/XDEPaper.pdf

While Pascal as originally designed wasn't suitable for systems programming, and various dialects sprung out of it, with Object Pascal from Apple/Borland being the most famous one, by 1978 the first standard for Modula-2 was released, which was inspired in Mesa, after Niklaus Wirth spent a sabaticall year at Xerox PARC. Years later, through a similar experience, the evolution of Mesa (Cedar) would influence him to come up with Oberon.

https://en.wikipedia.org/wiki/Modula-2

https://www.modula2.org/modula2-history.php

Then there was Ada, although too expensive to get compilers and high hardware requirements for 1980's computers.

Also all BASIC compilers in the 8 and 16 bit home computers had support for low level systems programming.

In recent programming languages, something like Zig would be the closest to what those languages were offering, in safety without having a GC of some form.

Naturally this takes cares of most C flaws, minus use-after-free, however due to their type systems, one tends to use heap allocations less than in C, although it remains an issue.


Yes, Rust protects against this and so does almost every language with garbage collection (Java, C#, Python, JS/TS, etc). C/C++ are pretty unique in being some of the only popular languages remaining that don't protect you from memory safety issues often causing exploitable vulnerabilities.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: