Amazing work. I'm interested in the choice of WASM - presumably any target that can run DOOM could've been used? Of which there are innumerable choices I assume. Was it for symbolic reasons or genuinely the most useful target?
love this feedback - will definitely talk about it in the next videos.
you're gonna laugh.. but the answer is "ignorance". I had no idea what I was doing and had literally never touched WebAssembly before but thought it'd be a good place to start. Then it just stuck.
Hilariously, later a friend explained to me "Dimitri, this would have been a LOT easier if you had just targeted ASSEMBLY. IT WAS RIGHT THERE IN THE NAME". haha. oh well! ignorance is bliss
Ah nice! Well, hats off this is really impressive. As other commenters mentioned the extent to which it's documented and the restricted scope probably helped.
WASM is one of the easier platforms to port as the Virtual Machine is well documented and there are actual implementations in many languages that can be used for debugging and comparing the results.
But that's all CPU -- he doesn't have to emulate the rest of the computer (video card, IO systems, etc). You provide WASM with your own interface to the outside world.
(not arguing, really just want to hear your thoughts to this!) so re: video card - but I did write what I don't know what else to call other than "a graphics driver" (i.e. it takes Doom palette pixel values and converts them to something the user sees on their screen with ASCII art). what else would you call that? or are you saying video card would have to be at the level of VSCode or my operating system that actually lights up physical pixels on my screen.
If instead of WASM you decided to emulate a different DOOM target like a PC then you'd have to emulate the actual VGA graphics hardware and enough of the other PC hardware to run the game. That level of emulation is a difficult project on it's own.
got it, ok sweet. thank you so much for explaining. I don't know a ton about these programming circles, so I don't want to say the wrong thing if I can avoid it. Sounds like you're saying that no WASM runtime is, in this sense, qualifying - which makes sense!
FWIW, llama.cpp has always had a JSON schema -> GBNF converter, although it launched as a companion script. Now I think it's more integrated in the CLI and server.
But yeah I mean, GBNF or other structured output solutions would of course allow you to supply formats other than JSON schema. It sounds conceivable though that OpenAI could expose the grammars directly in the future, though.
I think for certain tasks it's still easier to write the grammar directly. Does converting from JSON to a CFG limit the capabilities of the grammar? i.e., are there things JSON can't represent that a context free grammar can?
You might be right that they're similarly powerful. In some cases, an arbitrary output format might in and of itself be desirable. Like it might result in token savings or be more natural for the LLM. For instance, generating code snippets to an API or plain text with constraints.
And this is more esoteric, but technically in the case of JSON I suppose you could embed a grammar inside a JSON string, which I'm not sure JSON schema can express.
Similar approach to llama.cpp under the hood - they convert the schema to a grammar. Llama.cpp's implementation was specific to the ggml stack, but what they've built sounds similar to Outlines, which they acknowledged.
Awesome to see work in the DB wire compatible space. On the MySQL side, there was MySQL Proxy (https://github.com/mysql/mysql-proxy), which was scriptable with Lua, with which you could create your own MySQL wire compatible connections. Unfortunately it appears to have been abandoned by Oracle and IIRC doesn't work with 5.7 and beyond. I used it in the past to hack together a MySQL wire adapter for Interana (https://scuba.io/).
I guess these days the best approach for connecting arbitrary data sources to existing drivers, at least for OLAP, is Apache Calcite (https://calcite.apache.org/). Unfortunately that feels a little more involved.
> Applications that are unable to allow drivers to pop up dialogs can call SQLBrowseConnect to connect to the service.
> SQLBrowseConnect provides an iterative dialog between the driver and the application where the application passes in an initial input connection string. If the connection string contains sufficient information to connect, the driver responds with SQL_SUCCESS and an output connection string containing the complete set of connection attributes used.
> If the initial input connection string does not contain sufficient information to connect to the source, the driver responds with SQL_NEED_DATA and an output connection string specifying informational attributes for the application (such as the authorization url) as well as required attributes to be specified in a subsequent call to SQLBrowseConnect. Attribute names returned by SQLBrowseConnect may include a colon followed by a localized identifier, and the value of the requested attribute is either a single question mark or a comma-separated list of valid values (optionally including localized identifiers) enclosed in curly braces. Optional attributes are returned preceded with an asterix (*).
> In a Web-based authentication scenario, if SQLBrowseConnect is called with an input connection string containing an access token that has not expired, along with any other required properties, no additional information should be required. If the access token has expired and the connection string contains a refresh token, the driver attempts to refresh the connection using the supplied refresh token.
but not what you'd write within this pipe syntax as
foo(42, ^) + 1
The tradeoff is that it doesn't need some extra delimiter since the function call is what delimits it. Perhaps that's a better tradeoff, I'm not sure; but for sure we shouldn't have both.
The snapshot will include every repo with any commits
between the announcement at GitHub Universe on November 13th
and 02/02/2020, every repo with at least 1 star and any
commits from the year before the snapshot (02/03/2019 -
02/02/2020), and every repo with at least 250 stars.
This would make sense, as it looks like money wasn't a huge factor. If money wasn't an issue, time and effort would be, and you can easily reduce effort by storing pretty much everything.
Apparently the Saguache Crescent in Colorado is (as of this video in 2016) still using the linotype, the last known newspaper in the US to do so: https://www.youtube.com/watch?v=DNa9XRoNRUM