I describe my experience in writing a SQL transpiler using AI tools and a PEG parser in C++.
Curious to hear what you think!
Introduction
I wanted to demonstrate some of my ideas for a SQL transplilation (conversion between Postgres and T-SQL dialects). I chose an AI from one of the most reputable vendors: gemini.google.com. I limited the goals of my project to:
Implement a parser that understands a couple of SQL dialects (Postgres and T-SQL).
Implement a builder that converts AST into an internal representation of a SQL statement.
Implement the output statement generator in the desired dialect.
For example, the Transpiler should be able to convert a T-SQL statement such as
SELECT TOP 10 max([dbo].[col1]) FROM [dbo].[tbl]
into an equivalent Postgres statement
SELECT max("dbo"."col1") FROM "dbo"."tbl" LIMIT 10
and vice-versa.
As an additional benefit of this exercise, I wanted to learn about a modern parsing system. After learning about lex/yacc at college many years ago, I only had some experience with the Gold parser back in the beginning of the 2000s. And I also knew from my previous attempt to learn Antlr4 - it has unpredictable parsing time and numerous other problems, including “reduce-reduce” conflicts that are very hard to resolve. I googled around and found a modern parsing tool called PEG, along with its C++ variant, cpp-peglib, on GitHub.
Conclusion
Gemini was making numerous assumptions about my intentions and generated something entirely different from what I wanted. But I definitely improved my skills of telling Gemini what I wanted by asking it to do very primitive assignments. In a few cases, the AI saved me a bit of time, e.g., “write procedure to split text in vector of lines” or “write procedure to iterate all files in specified path and read the files as text.” The AI helped teach me about the PEG parser, the Visitor pattern, etc.
The idea behind the SQL Transpiler was to start with a very basic parser that extracts identifiers, numbers, and string literals, and then add various rules, such as LIMIT/TOP. This idea could be fruitful, and I would appreciate your feedback (my contact is phoenicyan at gmail dot com). I want to learn alternative ideas for a transpiler, especially from people who have had previous experience in the creation of transpilers.
In Part 2, I’m excited to share with you a fully functional transpiler that brings this idea to life.
Well-done post, I'd like to read more of their work and it's exciting to see these new ideas. Though as other people have said, the one set of empirical results that they present is a bit... confusing? I'd think they'd have some more compelling examples to present given all the pretty math.
Their modular norm paper (https://arxiv.org/abs/2405.14813) has several more examples; see their appendix D in particular, but these are also mystifying. Yes they're interested in how things scale but am I the only one to whom it seems that the training losses they report are just not competitive with things that are currently being used?
Curious since AlphaFold got released: have classical molecular dynamics sims in this area become obsolete, at least for protein folding? How does the research coming out of venues like DESRES compare? Are they working on more specific problems in the same area or are they in a different business altogether?
No. AlphaFold doesn't do dynamics; it does end-state snapshots only. It does not do anything about the motion of the atoms, which is the core functionality of MD.
MD was never really a viable way to do structure prediction, so it didn't become obsolete with AlphaFold. Instead, MD is more useful for studying the physical process of protein folding (before the protein folds to its final structure, as well as once it has reached its final structure and sort of jiggles and wiggles around that).
MD simulations typically aren’t run for time scales that tell you anything about the folding process. Most people are looking at motion after the protein has folded.
What if that assumption is not true? I have an opposite problem - I forget my iPhone somewhere in the house for 2-3 days, and then have hard time to find it and convince myself that I still need a smartphone.
Introduction
I wanted to demonstrate some of my ideas for conversion between Postgres and T-SQL dialects. I limited the goals of my project to:
Implement a parser that understands a couple of SQL dialects (Postgres and T-SQL).
Implement a builder that converts AST into an internal representation of a SQL statement.
Implement the output statement generator in the desired dialect.
For example, the Transpiler should be able to convert a T-SQL statement such as
SELECT TOP 10 max([dbo].[col1]) FROM [dbo].[tbl]
into an equivalent Postgres statement
SELECT max("dbo"."col1") FROM "dbo"."tbl" LIMIT 10
and vice-versa.
I also wanted to learn a modern parsing system. After learning about lex/yacc at college many years ago, I only had some experience with the Gold parser back in the beginning of the 2000s. And I also knew from my previous attempt to learn Antlr4 - it has unpredictable parsing time and numerous other problems, including “reduce-reduce” conflicts that are very hard to resolve. I googled around and found a modern parsing tool called PEG, along with its C++ variant, cpp-peglib, on GitHub.
Conclusion
Gemini was making numerous assumptions about my intentions and generated something entirely different from what I wanted. But I definitely improved my skills of telling Gemini what I wanted by asking it to do very primitive assignments. In a few cases, the AI saved me a bit of time, e.g., “write procedure to split text in vector of lines” or “write procedure to iterate all files in specified path and read the files as text.” The AI helped teach me about the PEG parser, the Visitor pattern, etc.
The idea behind the SQL Transpiler was to start with a very basic parser that extracts identifiers, numbers, and string literals, and then add various rules, such as LIMIT/TOP. This idea could be fruitful, and I would appreciate your feedback (my contact is phoenicyan at gmail dot com). I want to learn alternative ideas for a transpiler, especially from people who have had previous experience in the creation of transpilers.
In Part 2, I’m excited to share with you a fully functional transpiler that brings this idea to life.
Curious to hear what you think!
Introduction I wanted to demonstrate some of my ideas for a SQL transplilation (conversion between Postgres and T-SQL dialects). I chose an AI from one of the most reputable vendors: gemini.google.com. I limited the goals of my project to: Implement a parser that understands a couple of SQL dialects (Postgres and T-SQL). Implement a builder that converts AST into an internal representation of a SQL statement. Implement the output statement generator in the desired dialect.
For example, the Transpiler should be able to convert a T-SQL statement such as SELECT TOP 10 max([dbo].[col1]) FROM [dbo].[tbl] into an equivalent Postgres statement SELECT max("dbo"."col1") FROM "dbo"."tbl" LIMIT 10 and vice-versa.
As an additional benefit of this exercise, I wanted to learn about a modern parsing system. After learning about lex/yacc at college many years ago, I only had some experience with the Gold parser back in the beginning of the 2000s. And I also knew from my previous attempt to learn Antlr4 - it has unpredictable parsing time and numerous other problems, including “reduce-reduce” conflicts that are very hard to resolve. I googled around and found a modern parsing tool called PEG, along with its C++ variant, cpp-peglib, on GitHub.
Source code for this project is available here: https://github.com/phoenicyan/sql_transpiler.
Conclusion Gemini was making numerous assumptions about my intentions and generated something entirely different from what I wanted. But I definitely improved my skills of telling Gemini what I wanted by asking it to do very primitive assignments. In a few cases, the AI saved me a bit of time, e.g., “write procedure to split text in vector of lines” or “write procedure to iterate all files in specified path and read the files as text.” The AI helped teach me about the PEG parser, the Visitor pattern, etc. The idea behind the SQL Transpiler was to start with a very basic parser that extracts identifiers, numbers, and string literals, and then add various rules, such as LIMIT/TOP. This idea could be fruitful, and I would appreciate your feedback (my contact is phoenicyan at gmail dot com). I want to learn alternative ideas for a transpiler, especially from people who have had previous experience in the creation of transpilers. In Part 2, I’m excited to share with you a fully functional transpiler that brings this idea to life.