I'm using TimescaleDB to manage 450GB of stocks and options data from Massive (what used to be polygon.io), and I've been getting LLM agents to iterate over academic research to see if anything works to improve trading with backtesting.
It's an addictive slot machine where I pull the lever and the dials spin as I hope for the sound of a jackpot. 999 out of 1000 winning models do so because of look-ahead bias, which makes them look great but are actually bad models. For example, one didn't convert the time zone from UTC to EST, so five hours of future knowledge got baked into the model. Another used `SELECT DISTINCT`, which chose a value at random during a 0–5 hour window — meaning 0–5 hours of future knowledge got baked in. That one was somehow related to Timescale hypertables.
Now I'm applying the VIX formula to TSLA options trades to see if I can take research papers about trading with VIX and apply them to TSLA.
Whatever the case, I've learned a lot about working with LLM agents and time-series data, and very little about actually trading equities and derivatives.
(I did 100% beat SPY with a train/out-of-sample test, though not by much. I'll likely share it here in a couple weeks. It automates trading on Robinhood, which is pretty cool.)
Nice. I played with this a bit. Agents are very good at Rust and CUDA so massive parallelization of compute for things like options chains may give you an edge. Also, you may find you have a hard time getting very low latency connection - one that is low enough in ms so that when you factor in the other delays, you still have an edge. So one approach might be to acknowledge that as a hobbyist you can't compete on lowest-latency, so you try to compete on two other fronts: Most effective algorithm, and ability to massively parallelize on consumer GPU what would take others longer to calculate.
Best of luck. Super fun!
PS: Just a follow-up. There was a post here a few days ago about a research breakthrough where they literally just had the agent iterate on a single planning doc over and over. I think pushing chain of thought for SOTA foundational models is fertile ground. That may lead to an algorithmic breakthrough if you start with some solid academic research.
Interesting. I'm not familiar with ClickHouse. I've been manually triggering compression and continuous aggregates have been a huge boon. The database has been the least of my concerns. Can you tell me more about it?
Fun fact - some of it may be a subset of all data and with trimmed outlying points, so when you set some stop loss conditions they get tripped in the real world, but not by your dataset. Get data from my sources.
I'll notice that the trading model will filter out bear down trends which is very, very helpful but it doesn't trade short. I'll ask the coding agent to find several academic research papers about trading once intraday during a down trend -- a single scalping. It will return with ~10 references. It will recreate the model, do statistical analysis, and create a search grid backtest. This will immediately give information if there is any alpha. If there is, it will iterate integrating the concept into the existing trading model.
It has enough information that it will continue to iterate for the next several hours.
It's all happening in a black box. I have no idea. My concern isn't trading but rather to get it to continuously improve unsupervised without lying or hallucinating.
I developed a Claude skill that will interact with and press every button intercepting every request / response on a website building a Typescript API. I only have $10 in that account so there isn't much damage that it can do. Probably get me banned but I don't use Robinhood for real trading.
It's an addictive slot machine where I pull the lever and the dials spin as I hope for the sound of a jackpot. 999 out of 1000 winning models do so because of look-ahead bias, which makes them look great but are actually bad models. For example, one didn't convert the time zone from UTC to EST, so five hours of future knowledge got baked into the model. Another used `SELECT DISTINCT`, which chose a value at random during a 0–5 hour window — meaning 0–5 hours of future knowledge got baked in. That one was somehow related to Timescale hypertables.
Now I'm applying the VIX formula to TSLA options trades to see if I can take research papers about trading with VIX and apply them to TSLA.
Whatever the case, I've learned a lot about working with LLM agents and time-series data, and very little about actually trading equities and derivatives.
(I did 100% beat SPY with a train/out-of-sample test, though not by much. I'll likely share it here in a couple weeks. It automates trading on Robinhood, which is pretty cool.)