Optimizing Go string operations with practical examples

kiitos · on Dec 3, 2023

The thing that's missing from this (and most) articles about the described kinds of code optimizations is context: when is it appropriate to make these kinds of optimizing changes, and when is it not?

> [with] this change ... the speed improved significantly [but] the readability and ease of coding decreased quite a bit

The speed of this specific function, measured with the described (micro) benchmark, improved, yes, no doubt. But that's not (by itself) relevant! If you want to use an optimized implementation of some code, it has to represent a significant improvement in the overall system as measured by meaningful, agreed-upon, high-level performance metrics.

If this function is called many times per user request, and the optimized version has a measurable impact on request latency, then sure, go for it! But, otherwise, making this kind of change is the textbook definition of premature optimization, never a good idea.

rivo · on Dec 3, 2023

When I see these kind of readability-decreasing optimizations in code reviews, I always push back. Almost always, that same Go code queries some other service over the network or loads data from the database, or similar, and nobody will ever notice the speed improvement of IndexOf over Split at the bottom line. But everyone working with that code at a later time will spend extra time (sometimes hours) just trying to understand what's going on. That's much more expensive than a few extra CPU cycles here and there.

Sure, if you're writing an inner loop for some heavy workload, e.g. in graphics or maybe parsing terabytes of JSON, it makes sense. But outside of these rarely occurring situations, you should always prefer readability over low-level optimizations.

masklinn · on Dec 3, 2023

OTOH note that this optimisation can be bundled into a helper function so it's not quite as bad as some others. In fact the helper function is in the stdlib: https://pkg.go.dev/strings#Cut

This is most useful for the "split once" version, which is genuinely improved:

    scoreWithColor := strings.Split(score, " ")
    scoreStr := scoreWithColor[0]
    color := scoreWithColor[1]

becomes

    scoreStr, color, _ := strings.Cut(score, " ")

The iteration is worse, but not to an irredeemable degree,

    rounds := strings.Split(game, "; ")

    isPossible := false
    for _, round := range rounds {
      red, green, blue := findScores(round)
      if isRoundPossible(red, green, blue) {
        isPossible = true
      } else {
        isPossible = false

        // If only one round is not possible, there is
        // no need to process the other ones.
        break
      }
    }

becomes:

    isPossible := false
    for {
      round, tail, hasNext := strings.Cut(game, ";")
      game = tail
      red, green, blue := findScores(round)
      if isRoundPossible(red, green, blue) {
        isPossible = true
      } else {
        isPossible = false

        // If only one round is not possible, there is
        // no need to process the other ones.
        break
      }
      if !hasNext {
        break
      }
    }

costco · on Dec 3, 2023

Nice practical article. I tried to rewrite the code in it to use byte slices and calls to the bytes package instead of strings while keeping the calls to Split, thinking it would be faster due to less copying. The allocations were actually about 1.5x higher than in the original string version. It turns out a slice header in Go is 24 bytes versus 16 for a string header, and 24/16 = 1.5.

Tangentially related:

Not really necessary here but lately I've been using buffer pools like the one by the fasthttp author to avoid allocating in performance sensitive places. But the code is sometimes pretty ugly.

        // get buffer from pool
        body := bytebufferpool.Get()

        // create request body
        body.WriteString(`{"config_id":"`)
        for i := range configID {
                c := configID[i]
                if c != '/' {
                        body.WriteByte(c)
                } else {
                        // escape slash
                        body.WriteByte('\\')
                        body.WriteByte('/')
                }
        }
        body.WriteString(`","commit":`)
        if commit {
                body.WriteByte('1')
        } else {
                body.WriteByte('0')
        }
        body.WriteString(`,"struct_items":[]}`)

        // send request
        ...

        // add back to pool
        bytebufferpool.Put(body)

icholy · on Dec 3, 2023

You can clean up this code by writing a package for generating JSON token by token.

    w := jsonx.New(os.Stdout)
    w.OpenObject()
    w.ObjectKey("commit")
    w.Int(0)
    w.CloseObject()

costco · on Dec 3, 2023

Good point. I only have the ugly code in a few places so I debated whether doing something like this would be worth it or not. It probably is.

icholy · on Dec 4, 2023

I only switched mine once a second dev joined the project.

kiitos · on Dec 3, 2023

Be careful that you're not pooling buffers of wildly different sizes

Also this code has many many problems please never use it to encode JSON :)

costco · on Dec 3, 2023

It's part of an API client I made for Resy so there are some guarantees on what characters will be in configID which is why I don't escape everything. I actually have a separate pool but I used bytebufferpool.Get instead of client.pool.Get when writing it here so the intent was clearer. But the code has gotten me Friday 7 PM reservations at Tatiana so it works though it's probably overkill because network latency is by far the biggest cost in terms of time. I probably should just have a JSON encoder write to the buffer.

kiitos · on Dec 5, 2023

If allocations aren't a major contributor to costs, then there's no reason to use a sync.Pool, is there?

costco · on Dec 5, 2023

It was fun to write and if more people start botting like I do it might eventually become useful. But yes it's premature if that's what you mean.

NAHWheatCracker · on Dec 3, 2023

I'm doing Advent on Discord with a former coworker this year.

This morning we compared the performance of his solution in Rust to my solution in Go. His was running 2-4x faster than mine but we couldn't figure out why and it seemed so straight forward that we assumed Rust was just that much faster.

I had assumed strings.Split would return sub-slices of the original string until I read this post. Sure enough, I rewrite it using strings.Index and manually creating sub-slices and now it's comparable, albeit less readable.

Looking at the standard library source doesn't make it clear to me why this even works. strings.Split is appending sub-slices of the input string to the return value. I don't understand where the allocations are coming from, but I'm just not knowledgeable enough about go slices.

SPBS · on Dec 3, 2023

strings.Split does return subslices of the original string. It's not making a copy of the string. Allocations still have to be made for slice that holds the result of strings.Split. That's where the allocations are coming from.

saghm · on Dec 3, 2023

From looking into Rust's implementation of the iterator that's returned when splitting a string (which takes a bit of mental parsing, given the heavy use of macros: https://doc.rust-lang.org/src/core/str/iter.rs.html#731-747), it seems like it has a reference to the original string and an index that it increments until it finds the next pattern to "split" on. This seems like something that would be pretty easy to do in Go as well, except Go doesn't really have any standard iterator API from what I can tell. This seems like a pretty clear argument for why having one is preferable to just returning slices everywhere though; even if you end up wanting to iterator through everything immediately, you avoid allocating unless you actually have to. In Rust, you can do this by just calling `collect` on the iterator, and although I suspect whoever implemented this API for Go wouldn't like the "magic" of it generically supporting collecting into whatever container you want, it feels like even just having an iterator with a method to collect into a slice and nothing else would be preferable.

masklinn · on Dec 3, 2023

> it feels like even just having an iterator with a method to collect into a slice and nothing else would be preferable.

Most langages just have constructors or factory functions from iterator (or even iterable) to container. It’s a less generic (especially if constructors and functions are segregated) and / or efficient but otherwise works fine. No need for the iterator whatever (protocol, interface, trait) to have specific knowledge of individual collections.

pstuart · on Dec 3, 2023

There's going to be experimental support for built-in iteration in 1.22: https://github.com/golang/go/issues/61897

ikiris · on Dec 3, 2023

oh nice. i hope this lands.

pstuart · on Dec 3, 2023

Being that its rsc's pet project I'm confident that it will. It will also bring in coroutines so this should make processing large streams of data more performant in an idiomatic way.

marcus_holmes · on Dec 4, 2023

I love me some Go, but it is a gripe I have with the language; slice (and by extension string) allocation is opaque. There's no way of determining if two slices share the same underlying array, and no easy way of determining if an operation on a string will either cause the underlying array to be altered or cause the runtime to allocate a new array for the slice.

ikiris · on Dec 3, 2023

Why are you even making strings to begin with most of the time?

NAHWheatCracker · on Dec 3, 2023

I'm not sure what you're suggesting, the input in Advent of Code is usually meant to be treated as ASCII text, so strings are quite natural.

ikiris · on Dec 3, 2023

strings are immutable sequences of runes in go. Every time you make one its an allocation and time. You asked about speed differences between rust and go and not understanding where allocations are coming from, and are doing mostly unnessecary string allocation work for whats been the first 3 days of this year's AoC. At no point have I used strings this year so far.

NAHWheatCracker · on Dec 3, 2023

I have been using bufio.Scanner.Text() for reading my input files, but I suppose most of what I've done could use bufio.Scanner.Bytes().

NAHWheatCracker · on Dec 3, 2023

FWIW, I moved away from bufio.Scanner and now I just do one os.ReadFile() call to get the whole thing as bytes and wrote a little function to break it into lines. I'm not using strings anymore and I eeked out a bit of performance for all three days.

jensneuse · on Dec 3, 2023

We're using similar techniques to parse and merge JSON objects, which could be applied to this problem as well. Essentially, you could index the whole input into an AST and the. Traverse the AST to find the data you're looking for. In our case, we're not just doing that, but recursively merge JSON objects at the AST level which is a lot faster than merging the actual data. The full blog post on the problem and solution can be found here: https://wundergraph.com/blog/astjson_high_performance_json_t...

kiitos · on Dec 3, 2023

https://github.com/wundergraph/graphql-go-tools/blob/dcd50bd...

Each iteration of this benchmark measures the aggregate performance of

- 1x ParseObject

- 3x AppendObject

- 3x MergeNodesWithPath

- 1x PrintNode

- 1x bytes.Equal comparison of two byte slices

So the benchmark isn't really measuring MergeNodesWithPath, it's measuring a much larger composite operation, which includes (multiple) calls to MergeNodesWithPath but also all of the above listed calls as well.

If you want to measure MergeNodesWithPath, you would need to have each iteration of the loop do a single MergeNodesWithPath call, on the same JSON method receiver, and with the same input parameters.

edit: similar issues exist in many/most of the other benchmark functions as well

coumbaya · on Dec 3, 2023

Interesting article ! Another imrpovement (unrelated to strings) would be to exit the function as soon a an impossible RGB value is reached (instead of processing everyrhing).

RenThraysk · on Dec 3, 2023

Seems to have missed the strings.Cut* family added to go 1.18

costco · on Dec 3, 2023

Well, they are implemented in a pretty similar way: https://cs.opensource.google/go/go/+/refs/tags/go1.21.4:src/... so it shouldn't affect the benchmarks too much.

tedunangst · on Dec 3, 2023

Cut only scans the string once (not twice) and doesn't allocate a slice to return.

costco · on Dec 3, 2023

Maybe I'm confused on how strings work but if this doesn't allocate:

        // excerpted from Cut
        if i := Index(s, sep); i >= 0 {
                return s[:i], s[i+len(sep):], true
        }
        return s, "", false

Then why shouldn't this also not allocate?

        eoc = strings.Index(round, ", ")
        if eoc == -1 {
                currentScoreWithColor = round
        } else {
                currentScoreWithColor = round[:eoc]
                round = round[eoc+2:]
        }

Edit: oh, you're comparing it to Split. I was comparing it to what they ended up implementing.

mitchbob · on Dec 3, 2023

https://archive.is/U54jZ