More

captaintobs · on Oct 15, 2024

There are many teams using SQLMesh in production. Fivetran, Harness, Hopper, Pitchbook to name a few.

You can read some case studies here https://tobikodata.com/harness.html or join Slack to meet with folks to learn more about their experiences.

abtinf · on Oct 16, 2024

How does Fivetran use SQLMesh?

captaintobs · on Oct 16, 2024

They're using it for data transformation.They're long time dbt users, but are switching to SQLMesh because it's extremely efficient, provides a better development experience, and can help them become warehouse agnostic.

captaintobs · on Oct 15, 2024

Thanks! Yes, it's a much requested feature but it's difficult to get right!

captaintobs · on Aug 11, 2023

Why is this faster than the stdlib? What does it do to achieve better performance?

uniqueuid · on Aug 11, 2023

It's in the readme of the github project.

> In short, the main reasons why MPIRE is faster are:

    When fork is available we can make use of copy-on-write shared objects, which reduces the need to copy objects that need to be shared over child processes

    Workers can hold state over multiple tasks. Therefore you can choose to load a big file or send resources over only once per worker

    Automatic task chunking

niemandhier · on Aug 11, 2023

COW can come back and bite you by causing not easily predictable runtime.

Your code goes down a rarely used branch and suddenly a large object gets copied.

misnome · on Aug 11, 2023

Isn’t this given “for free” by the fact that it’s fork, even in standard multiprocessing? What does the library do extra?

Galanwe · on Aug 12, 2023

It doesn't do much extra I guess.

In standard multiprocessing, all arguments are pickled and pushed to a queue for processes in the pool to use.

To pass heavy arguments, the trick to using CoW was to place them as global variables before the map.

My understanding from Mpire is that they do the same thing, but expose a `shared_objects` parameter to make it less hacky than global variables.

I guess their benchmarks compare against pickling arguments, not against using global variables/CoW, which is why they boast performance increase.

indeedmug · on Aug 11, 2023

Yea, I am a struggling to figure out what the secret sauce of this library and if that sauce is introducing foot guns down the line.

Multiprocessing std uses fork in linux distros already. I once ran a multiprocess code on Linux and Windows and there was a significant improvement in performance when running Linux.

mufti_menk · on Aug 11, 2023

They're deprecating fork in 1 or 2 versions, one of the main issues with it is copies locks across processes which can cause deadlocks.

captaintobs · on July 3, 2023

Very cool work!

captaintobs · on May 16, 2023

It's mostly affecting the amount of AI content I have to wade through these news aggregators...

captaintobs · on May 1, 2023

There's a lot of hype with ruff, but I've been doing fine with black and autoflake. I have a pretty sizeable project and have never thought to myself it's problematic because it's slow.

captaintobs · on May 1, 2023

We used both svb and first republic!

captaintobs · on April 28, 2023

This is a garbage/sensationalist/click bait article. They source blind conversations.

lisasays · on April 28, 2023

Like almost everything on BI it seems.

It's just an awful, spammy clickbait-y site, all the way down. Almost by definition in violation of HN guidelines.

captaintobs · on April 28, 2023

Monorepos at scale only really works if you have the tooling and infra to support it. Otherwise, you're going to be miserable with slow builds, pushes, pulls, impossible merge conflicts.

captaintobs · on April 27, 2023

It's fun having the ritual of grinding coffee. Maybe I can't tell the difference but I enjoy it!