More

tobiadefami · 2025-03-03T11:38:36 1741001916

I got tired of spending hours digging through Reddit threads and YouTube comments when trying to figure out if a product was worth buying. So I built a thing.

It's basically an AI-powered scraper that pulls reviews from Reddit and YouTube, then structures all that messy data into something actually useful - pros, cons, sentiment analysis, and an overall "should you buy this?" recommendation. Built it with FastAPI, Next.js and PostgreSQL, all wrapped up in Docker.

Looking back, I probably should've just made it a simple CLI tool instead of a whole web app. Would've saved me a lot of time...

Still rough around the edges, but the core works. Code’s on GitHub if you want to tinker

tobiadefami · 2025-02-28T14:35:23 1740753323

Great question,

Unlike Jupyter or Deepnote where you write code directly, Probly is primarily prompt-based - you describe what analysis you want in natural language, and the AI generates and executes the appropriate Python code behind the scenes using Pyodide.

The key difference has to be that Probly runs python entirely in your browser using WASM, while jupyter/Deepnote run code on servers.

mfdupuis · 2025-02-28T16:52:18 1740761538

Disclosure, I'm a founder in the data space[1]

Have you thought about how you would handle much larger datasets? Or is the idea that since this is a spreadsheet, the 10M cell limit is plenty sufficient?

I find WASM really interesting, but I can't wrap my head around how this scales in the enterprise. But I figure it probably just comes down to the use cases and personas you're targeting.

[1] https://www.fabi.ai/

revenga99 · 2025-03-01T05:18:58 1740806338

I am also very deeply invested in this question. It seems like the goto path for huge large data sets is text to sql (clickhouse, snowflake) etc. But all these juicy python data science libraries require code execution based on off the much small data payloads from the sql results. Feel free to reach out, what you are trying to achieve seems very similar to what I am trying to do in a completely different industry/usecase.

tobiadefami · 2025-02-28T11:40:00 1740742800

Docker config is already implemented—you can check out the repo now. Adding support for other LLM providers like Ollama is something to consider for future development. Thanks for the suggestion!

tobiadefami · 2025-02-28T01:48:32 1740707312

Thanks for taking the time to try it out and share your thoughts. I really appreciate the detailed feedback from a real-world use case.

Glad to hear the setup was smooth and that the chat box + import/export features worked well for you. Noted on the UI tweaks, I'll look into making them more intuitive.

On the categorization issue, yeah, LLMs can struggle with nuanced transaction labeling, especially without proper context or examples. Structured prompting could help, which ties into your second point -- having a library of refined prompts that can be reused for repetitive tasks would be really valuable.

I love your feedback -- it's exactly what helps improve the tool. And again, thanks for testing it out!

tobiadefami · 2025-03-02T17:44:28 1740937468

Prompt library has also been implemented and can be opened with ctrl+shift+L; or cmd+shift+L on mac

librasteve · 2025-02-28T09:16:20 1740734180

having thought about this a bit - my new conjecture is that if I had a way to feed in an example map of transaction payee => category, as one of the prompts, and a way to incrementally add prompts for outliers, then the AI _might_ be able to do a reasonable job - I am planning to mess with raku LLM::Functions to see if I can get this to work

tim-fan · 2025-02-28T15:03:55 1740755035

Hi I've been thinking about the same thing, in the context of beancount / plain text accounting.

https://www.reddit.com/r/plaintextaccounting/s/BKsaLrfy3A

I already have thousands of labeled examples and a list of valid categories. I'm also hoping an llm will do a reasonable job.

At the moment I'm wondering what to do with all the example transaction data, as it's likely larger than the context window. I guess I could take a random downsample, but perhaps there's a more effective way to summarize it.

tobiadefami · 2025-02-28T00:01:30 1740700890

I'm actually a fan of what the team at Quadratic is building. It's definitely the more mature product with a robust Python implementation and a well-designed interface that bridges spreadsheets and code. Their stack appears to be built on Rust, which likely gives them performance advantages.

Probly is earlier stage and more minimalist, but we're tackling the same fundamental problem. Our specific focus is on making data analysis a fully autonomous process powered by AI - where you can describe what you want to learn from your data and have the system figure out the rest.

tobiadefami · 2025-02-27T23:36:34 1740699394

Probly doesn't have built-in test assertion functionality yet, but since it runs Python (via Pyodide) directly in the browser, you can write test functions with assertions in your Python code. The execute_python_code tool in our system can run any valid Python code, including test functions.

This is something we're considering for future development, so this is a great shout!

westurner · 2025-02-28T04:31:14 1740717074

To have tests that can be copied or exported into a .py module from a notebook is advantageous for prototyping and reusability.

There are exploratory/discovery and explanatory forms and workflows for notebooks.

A typical notebook workflow: get it working with Ctrl-Enter and manually checking output, wrap it in a function(s) with defined variable scopes and few module/notebook globals, write a test function for the function which checks the output every time, write markdown and/or docstrings, and then what of this can be reused from regular modules.

nbdev has an 'export a notebook input cell to a .py module' feature. And formatted docstrings like sphinx apidoc but in notebooks. IPython has `%psource module.py` for pygments-style syntax highlighting of external .py modules and `%psave output.py` for saving an input cell to a file, but there are not yet IPython magics to read from or write to certain lines within a file like nbdev.

To run the chmp/ipytest %%ipytest cell magic with line or branch coverage, it's necessary to `%pip install ipytest pytest-cov` (or `%conda install ipytest pytest-cov`)

jupyter-xeus supports environment.yml with jupyterlite with packages from emscripten-forge: https://jupyterlite-xeus.readthedocs.io/en/latest/environmen...

emscripten-forge src: https://github.com/emscripten-forge/recipes/tree/main/recipe... .. web: https://repo.mamba.pm/emscripten-forge

tobiadefami · 2025-02-27T21:15:01 1740690901

posted one about a week or so ago on my LinkedIn

https://www.linkedin.com/posts/oluwatobiadefami_vibe-coding-...

librasteve · 2025-02-28T09:13:31 1740734011

excellent - thanks!

tobiadefami · 2025-02-27T21:13:36 1740690816

Or the AI is a man united fan and is hopeful for a top 4 finish this season :D

tobiadefami · 2025-02-27T16:30:21 1740673821

Glad you like it!

arthurcolle · 2025-02-27T18:14:00 1740680040

From the Miami colloquialism, "Supposably" could be good for an advanced stats add-on!;)

tobiadefami · 2025-02-27T21:16:31 1740690991

haha! i'll consider it

tobiadefami · 2025-02-27T16:27:25 1740673645

Thanks you!

Right now, you can clone the repo and follow the instructions to run it locally with your own OpenAI key. I'm working on a hosted demo that will let you try it out directly without any setup. stay tuned :)