> For example, in the prompt for this experiment, the model is bootstrapped with the correct Form 1040 lines and short instructions as part of its context.
Given that only short instructions are in context, I would not have expected even a frontier model to score well on this benchmark. For better results, I'd think that giving the model access to the entire tax code is required (which likely requires RAG due to its sheer size).
We tested models with knowledge cutoffs in 2025 so expect them to have knowledge of Tax Year 2024 forms in their weights. We also tested models with ability to do web search to get any other forms it thinks necessary: https://github.com/column-tax/tax-calc-bench
That all being said, we agree, which is what we've built with our internal tax coding agent, Iris: https://www.columntax.com/blog/introducing-iris-our-ai-tax-d... (ability to get just the right Tax form context on a per-line basis to turn the tax law into code).
Column Tax is on a mission to help Americans gain financial independence.
Our first product is an API that brings the power of a high-end accountant to users who otherwise wouldn’t be able to access financial advice. Our users are individuals and families living paycheck-to-paycheck who use Column to get out of debt, move into their own homes, and save up for emergency bills.
Things we value:
- Focus: we know that doing great work takes long blocks of uninterrupted time
- Ownership: everyone at Column is empowered to make big decisions
- Transparency: expect honest & vulnerable communication
- Doing the right thing: Taxes is an industry with a history of predatory practices; we’re determined to fix that
To apply or if you have any question, see the full job description at https://jobs.columntax.com/ or email me (CTO) at michael@columntax.com directly.
Pioneer | San Francisco, CA | Full-Time, ONSITE | Software Engineer, Business Operations | https://pioneer.app
We’re building software to scalably find & support the world’s creative outsiders — the next generation of researchers, entrepreneurs, artists, and engineers.
In the ~9 months since our launch, we’ve seen thousands of applicants from over 100 countries, working on projects spread across almost every industry (check out our recent winners here: https://pioneer.app/blog/meet-the-pioneers-take-3).
We’re a small team of 5 led by our founder Daniel Gross (founder of Cue, acquired by Apple, former YC AI Partner), and are looking to hire our 3rd engineer & someone to run business operations.
I'm a fan of https://www.vamo.com/ for multi-city trips. You put in your point of origin and all the cities you want to visit (and for how long), and it routes you between the cities on the best type of transportation and offers accommodation suggestions as well.