More

tomkwong · 2025-06-10T07:06:31 1749539191

I had done something slightly different. I would ask LLM to prepare a design doc, not code, and iterate on that doc before I ask them to start coding. That seems to have worked a little better as it’s less likely to go rogue.

tomkwong · on Feb 11, 2023

I have learned to not worry about coding style anymore, not because I don’t care, but because too many people care about that. Having worked with one of the largest monorepo, I’m happy enough to align with others with a standard coding style that is supported by a formatter and just let the formatter do its job. Humans are amazing when it comes to learning new patterns, and coding style is one that can be easily adapted to.

tomkwong · on June 8, 2022

> you just have to keep in your head all the methods that are expected to exist for a given type.

Technically, you don't need to keep that in your head :-) The general approach is to define generic functions and also write docs about how to extend those functions to satisfy interface requirements.

Perhaps not too surprisingly, many of the Julia community people also want to have some official interface support directly from the language. Before that, several open-source projects were spawned to address that gap e.g. here is a shameless plug about my package:

https://github.com/tk3369/BinaryTraits.jl

tomkwong · on May 31, 2022

There’s no privacy agreement to sign when setting up the fridge? I had to click on the Agree button several times when I started with my LG TV!

tomkwong · on May 27, 2022

100% click bait title and opening statement. Is journalism that bad these days? Ironically I quite like some of the points in the main content.

MafellUser · on May 27, 2022

substack is basically a blogging platform. Are bloggers journalists? Are cats actually dogs?

tomkwong · on May 5, 2022

First, I want to say that this is a great post. You always grow stronger when you make mistakes. Writing it up solidify understanding in the learning process.

This story resonates with many people here because many experienced engineers had done something similar before. For me, destructive batch operations like this would be two distinct steps:

1. Identify files that need to be deleted; 2. Loop through the list and delete them one by one.

These steps are decoupled so that the list can be validated. Each step can be tested independently. And the scripts are idempotent and can be reused.

Production operations are always risky. A good practice is to always prepare an execution plan with detailed steps, a validation plan, and a rollback plan. And, review the plan with peers before the operation.

notyourday · on May 5, 2022

> 1. Identify files that need to be deleted; 2. Loop through the list and delete them one by one.

> These steps are decoupled so that the list can be validated. Each step can be tested independently. And the scripts are idempotent and can be reused.

This is the most underrated comment.

I'm saying it as someone who had the ultimate oversight of deleting hundreds of TBs per day spread of billions of files on different clouds and local storage.

spiffytech · on May 6, 2022

I've never regretted treating tasks like this as a pipeline of discrete steps with explicit outputs and inputs. Sending output to a file, viewing it, then having something process the file is such a great safety net.

tomkwong · on June 7, 2021

I agree for the most part. However, IMHO using these words (just, simply, etc) occasionally can make the doc more lively and fun to read.

justin_oaks · on June 7, 2021

Do you have an example of such writing? Or produce an example?

I'm having a hard time understanding how such language would make documentation more "lively and fun to read".

tomkwong · on Oct 22, 2020

I started walking for at least 30 minutes every morning before work, and it has become a habit during this pandemic. I realized that the best ideas of solving problems came from this little exercise. Subsequently, I’ve learned to start tapping away on my phone so I can capture these ideas before I lose them since there are too many to remember well.

tomkwong · on Oct 11, 2020

Storing larger data sets in CSV format is a recipe for disaster. As tech industry we should really come together With a standard binary format for data exchange. Maybe Arrow?

manigandham · on Oct 11, 2020

Arrow is designed for in-memory processing. It can be saved on disk so you can open it directly (memory map) but it's not a great storage format. Parquet or ORC is a better choice, but they don't have as much tooling for import/export. CSV is just the simplest way to transfer data.

You might be interested in DuckDB though which trying to create a new standard for passing datasets: https://duckdb.org/

antb123 · on Oct 12, 2020

why not just pass around sqlite databases?

tomkwong · on Sept 1, 2020

Building queries in a programming language is more flexible and the data operation is explicit. You rely on good programmers to do it right.

By contrast, SQL is more constrained about what it can do. It's declarative and you rely on the database engine to optimize the procedure.

Neither is perfect.

SigmundA · on Sept 1, 2020

Linq can be used in a declarative way such that the AST is available and can be rewritten for optimization or transformation say to SQL [1].

Not that this is easy but many project utilize it and the relinq project tries to give you a more usable starting point [2].

After having done SQL for years then Linq, I actually prefer the more explicit operation of Linq by default, I know the order of execution is the one I specify. Also being able to use a full rich imperative language in the query is so very useful.

[1] https://docs.microsoft.com/en-us/dotnet/api/system.linq.ique...

[2] https://github.com/re-motion/Relinq