The thing that's missing from this (and most) articles about the described kinds of code optimizations is context: when is it appropriate to make these kinds of optimizing changes, and when is it not?
> [with] this change ... the speed improved significantly [but] the readability and ease of coding decreased quite a bit
The speed of this specific function, measured with the described (micro) benchmark, improved, yes, no doubt. But that's not (by itself) relevant! If you want to use an optimized implementation of some code, it has to represent a significant improvement in the overall system as measured by meaningful, agreed-upon, high-level performance metrics.
If this function is called many times per user request, and the optimized version has a measurable impact on request latency, then sure, go for it! But, otherwise, making this kind of change is the textbook definition of premature optimization, never a good idea.
When I see these kind of readability-decreasing optimizations in code reviews, I always push back. Almost always, that same Go code queries some other service over the network or loads data from the database, or similar, and nobody will ever notice the speed improvement of IndexOf over Split at the bottom line. But everyone working with that code at a later time will spend extra time (sometimes hours) just trying to understand what's going on. That's much more expensive than a few extra CPU cycles here and there.
Sure, if you're writing an inner loop for some heavy workload, e.g. in graphics or maybe parsing terabytes of JSON, it makes sense. But outside of these rarely occurring situations, you should always prefer readability over low-level optimizations.
OTOH note that this optimisation can be bundled into a helper function so it's not quite as bad as some others. In fact the helper function is in the stdlib: https://pkg.go.dev/strings#Cut
This is most useful for the "split once" version, which is genuinely improved:
The iteration is worse, but not to an irredeemable degree,
rounds := strings.Split(game, "; ")
isPossible := false
for _, round := range rounds {
red, green, blue := findScores(round)
if isRoundPossible(red, green, blue) {
isPossible = true
} else {
isPossible = false
// If only one round is not possible, there is
// no need to process the other ones.
break
}
}
becomes:
isPossible := false
for {
round, tail, hasNext := strings.Cut(game, ";")
game = tail
red, green, blue := findScores(round)
if isRoundPossible(red, green, blue) {
isPossible = true
} else {
isPossible = false
// If only one round is not possible, there is
// no need to process the other ones.
break
}
if !hasNext {
break
}
}
Nice practical article. I tried to rewrite the code in it to use byte slices and calls to the bytes package instead of strings while keeping the calls to Split, thinking it would be faster due to less copying. The allocations were actually about 1.5x higher than in the original string version. It turns out a slice header in Go is 24 bytes versus 16 for a string header, and 24/16 = 1.5.
Tangentially related:
Not really necessary here but lately I've been using buffer pools like the one by the fasthttp author to avoid allocating in performance sensitive places. But the code is sometimes pretty ugly.
// get buffer from pool
body := bytebufferpool.Get()
// create request body
body.WriteString(`{"config_id":"`)
for i := range configID {
c := configID[i]
if c != '/' {
body.WriteByte(c)
} else {
// escape slash
body.WriteByte('\\')
body.WriteByte('/')
}
}
body.WriteString(`","commit":`)
if commit {
body.WriteByte('1')
} else {
body.WriteByte('0')
}
body.WriteString(`,"struct_items":[]}`)
// send request
...
// add back to pool
bytebufferpool.Put(body)
It's part of an API client I made for Resy so there are some guarantees on what characters will be in configID which is why I don't escape everything. I actually have a separate pool but I used bytebufferpool.Get instead of client.pool.Get when writing it here so the intent was clearer. But the code has gotten me Friday 7 PM reservations at Tatiana so it works though it's probably overkill because network latency is by far the biggest cost in terms of time. I probably should just have a JSON encoder write to the buffer.
I'm doing Advent on Discord with a former coworker this year.
This morning we compared the performance of his solution in Rust to my solution in Go. His was running 2-4x faster than mine but we couldn't figure out why and it seemed so straight forward that we assumed Rust was just that much faster.
I had assumed strings.Split would return sub-slices of the original string until I read this post. Sure enough, I rewrite it using strings.Index and manually creating sub-slices and now it's comparable, albeit less readable.
Looking at the standard library source doesn't make it clear to me why this even works. strings.Split is appending sub-slices of the input string to the return value. I don't understand where the allocations are coming from, but I'm just not knowledgeable enough about go slices.
strings.Split does return subslices of the original string. It's not making a copy of the string. Allocations still have to be made for slice that holds the result of strings.Split. That's where the allocations are coming from.
From looking into Rust's implementation of the iterator that's returned when splitting a string (which takes a bit of mental parsing, given the heavy use of macros: https://doc.rust-lang.org/src/core/str/iter.rs.html#731-747), it seems like it has a reference to the original string and an index that it increments until it finds the next pattern to "split" on. This seems like something that would be pretty easy to do in Go as well, except Go doesn't really have any standard iterator API from what I can tell. This seems like a pretty clear argument for why having one is preferable to just returning slices everywhere though; even if you end up wanting to iterator through everything immediately, you avoid allocating unless you actually have to. In Rust, you can do this by just calling `collect` on the iterator, and although I suspect whoever implemented this API for Go wouldn't like the "magic" of it generically supporting collecting into whatever container you want, it feels like even just having an iterator with a method to collect into a slice and nothing else would be preferable.
> it feels like even just having an iterator with a method to collect into a slice and nothing else would be preferable.
Most langages just have constructors or factory functions from iterator (or even iterable) to container. It’s a less generic (especially if constructors and functions are segregated) and / or efficient but otherwise works fine. No need for the iterator whatever (protocol, interface, trait) to have specific knowledge of individual collections.
Being that its rsc's pet project I'm confident that it will. It will also bring in coroutines so this should make processing large streams of data more performant in an idiomatic way.
I love me some Go, but it is a gripe I have with the language; slice (and by extension string) allocation is opaque. There's no way of determining if two slices share the same underlying array, and no easy way of determining if an operation on a string will either cause the underlying array to be altered or cause the runtime to allocate a new array for the slice.
strings are immutable sequences of runes in go. Every time you make one its an allocation and time. You asked about speed differences between rust and go and not understanding where allocations are coming from, and are doing mostly unnessecary string allocation work for whats been the first 3 days of this year's AoC. At no point have I used strings this year so far.
FWIW, I moved away from bufio.Scanner and now I just do one os.ReadFile() call to get the whole thing as bytes and wrote a little function to break it into lines. I'm not using strings anymore and I eeked out a bit of performance for all three days.
We're using similar techniques to parse and merge JSON objects, which could be applied to this problem as well. Essentially, you could index the whole input into an AST and the. Traverse the AST to find the data you're looking for. In our case, we're not just doing that, but recursively merge JSON objects at the AST level which is a lot faster than merging the actual data. The full blog post on the problem and solution can be found here: https://wundergraph.com/blog/astjson_high_performance_json_t...
Each iteration of this benchmark measures the aggregate performance of
- 1x ParseObject
- 3x AppendObject
- 3x MergeNodesWithPath
- 1x PrintNode
- 1x bytes.Equal comparison of two byte slices
So the benchmark isn't really measuring MergeNodesWithPath, it's measuring a much larger composite operation, which includes (multiple) calls to MergeNodesWithPath but also all of the above listed calls as well.
If you want to measure MergeNodesWithPath, you would need to have each iteration of the loop do a single MergeNodesWithPath call, on the same JSON method receiver, and with the same input parameters.
edit: similar issues exist in many/most of the other benchmark functions as well
Interesting article ! Another imrpovement (unrelated to strings) would be to exit the function as soon a an impossible RGB value is reached (instead of processing everyrhing).
> [with] this change ... the speed improved significantly [but] the readability and ease of coding decreased quite a bit
The speed of this specific function, measured with the described (micro) benchmark, improved, yes, no doubt. But that's not (by itself) relevant! If you want to use an optimized implementation of some code, it has to represent a significant improvement in the overall system as measured by meaningful, agreed-upon, high-level performance metrics.
If this function is called many times per user request, and the optimized version has a measurable impact on request latency, then sure, go for it! But, otherwise, making this kind of change is the textbook definition of premature optimization, never a good idea.