If someone completely new to my codebase came along and tried to evaluate my code I'd laugh. You can't jump in and "evaluate" without knowing context and being familiar with the features.
What’s more funny is how people commonly believe like this their code is so special or different than everyone else’s. Reverse engineering a large code base is not anywhere as hard as most software engineers somehow believe. I was tasked with analyzing a large code base from a company that my employer had been an investor in and as part of their investment had been given IP rights, the company employees were floored when our team, lead by me had ripped apart their system into multiple components and reused them in way they hadn’t. Trust me if one is experienced enough they can understand your code perfectly fine, it’s definitely not as special as you think it is.
Really interesting story, approximately how many lines of code were in that codebase?
It's hard for me to imagine someone grokking a 10M+ line codebase without external help, but I've never tried it. I do agree with the assertion that most codebases are not as _special_ as they like to think.
This was just over 600k of mostly c++ code. It’s certainly true that it helped I was familiar with the domain and the various technologies they had used, like CORBA and xml, this was late 90s.
10M is a pretty massive codebase like the entire linux kernel with all drivers is somewhere in that size. Most corporate systems aren’t that big and even for Linux you wouldn’t need to understand all drivers to understand the core kernel, I suspect the core kernel is maybe max 1M.
That is interesting. I wondering how the 15 people that had created the ~600K c++ codebase I’m talking about compares to the FB headcount on android Messanger, does anyone know how big that team is? loc/head is a curious measure.
Regardless it’s a bit concerning that it takes 10M loc for a messaging app.
Not forgetting of course that Twitter almost certainly uses several languages on the backend, and has it entwined in their infrastructure. As TFA says:
"One former Tesla engineer, who spoke on the condition of anonymity to candidly describe the matter but was not involved, said Tesla engineers would have trouble capably assessing Twitter’s code. Distributed systems, the large-scale and spread-out network that Twitter is composed of, are not the automaker’s specialty, the person said."
I'm not doubting your story, but this is not the norm in my experience.
It’s certainly true that it helped I was
familiar with the domain
Technical prowess and domain knowledge are excellent assets, obviously, but in my experience they're often not enough.
The big tangled enterprise codebases I've dealt with (insurance companies, fintech, construction, etc) involved absolute metric tons of undocumented domain knowledge and lots of company-specific "tribal knowledge." Some tribal knowledge was embedded in the code in undocumented or semi-documented form, and much existed outside the codebase entirely... all kinds of custom infrastructure, etc.
I don't care how sharp and domain-familiar a team is. That sort of situation is not easily tameable.
I would say it's pretty "special" and well written code if outsiders can come in and quickly understand it. You should have commended them for their work.
I was coming here to say this. I have definitely found myself reading a codebase that is easy to read and easy to follow, and mistakenly concluding therefore that its developers were not solving complicated problems.
That is true. They were definitely a skilled team that had written the system. However I still think most large systems can be reverse “design” engineered fairly easily by someone who is experienced.
Agreed. Most code at the top tech companies isn't interesting or even necessarily good. The hardest part of jumping into a new code base is almost always understanding the problem it's solving rather than the technology used to solve it.
At my last employer before retiring (not tech but used a lot) has a very unique (and way larger than Twitter) complex set of businesses. They change at an insane pace and often involve things that in the end don't ship, resulting in crazy complex code base networks. We also had 100's of teams building every kind of software imaginable (server api's, web apps, mobile apps, internal apps, hardware with embedded code, etc). Anyone from the outside coming in cold to examine the code would have no idea where to even start, much less be able to evaluate anything. It's not that any individual thing was necessarily complex, but there were so many interconnected business practices and related businesses that understanding how they relate is very hard for anyone who has been there for years much less someone from an unrelated industry.
For example you could look at my team's mobile codebases and probably figure out what was going on, but understanding all the services we consumed, and what they consumed, etc. (given the deep mix of micro services and macroservices) would make understanding the why of the entire system impossible.
Depends on the codebase. Go read the source code for GHC and tell me how quickly you could add a new primitive type that consists of all twos-complement 7-bit numbers.
A pretty trivial change for someone whose steeped in the codebase, likely impossible without a few weeks (or even months) of effort for anyone else. Of course, this all becomes exponentially easier if you have an author of the code to point you in the right direction.
And how long exactly did that take? I'm going to cast doubt on your story because in my experience people that often claim to understand how a codebase works in a quick amount of time tend to be full of it and end up blowing things up when they try to change or modify things.
The codebase often isn't the issue, it's the use cases and the reasons why it evolved into the form it did.
I guess large is a relative question. For me ~600K lines of C++ is a pretty large code base. Apparently Facebook Messenger just on Android is 10M lines of Kotlin in another comment. To me that seems like they are probably doing something wrong.
But to address your question here is my recollection of what happened, now more than 20 years ago fwiw. We were given the code and I spent maybe 6-7 days, all day, reading it and analyzing it with this tool I had, called Source Navigator [1]. Then we spent 1 full work week at the other companies HQ, mostly in meetings asking questions on different modules and classes. Then when we returned to our offices it took me another 2 weeks of work to get the system setup and deploy a few components inside our own middleware system. I was the primary c++ expert, there was another business analyst and I had a more junior developer who worked with me. So in comparison to Twitter certainly a much much smaller scale situation. The team that had written the system was around 15 people.
We definitely used the code and I don’t recall it being much a problem at all that other people had written it. Plenty of open source projects have random contributors show up and work fine in their code base.
I think a lot of SWEs have pretty big egos and tend to overestimate how special or unique their particular projects are based on my own professional experience. This particular situation was an example but there have been plenty others. When I fix or find bugs in other’s code sometimes they are surprised which for me is always surprising. Why are you so surprised I can debug your code?
> Trust me if one is experienced enough they can understand your code perfectly fine, it’s definitely not as special as you think it is.
What if a large portion of the codebase is, for example, shader code? I chose this example because coding for the GPU isn't the same as coding for the CPU. Do you think that's a scenario in which you'd require more study, or are you confident your experience would spill over into this new domain, no study required?
This I agree with. I was very familiar with the business domain and the technology they had been using. So yes I agree if either of those were radically different it would be much harder. Good point.
You're missing their point. They're using that as an example of how domain knowledge can be important. It's a really naive take to think all/most of the code written for a social media platform will be immediately accessible to people writing code for a machine learning+robotics platform or for people writing code for an entertainment console. In fact, it's entirely possible to have created these things without any overlap in technology.
Those people will have all the time they need to talk to as many engineers they need to understand the code base. Maybe a person hasn’t worked with shaders before, but then it’s a great time for the engineer who writes the shaders to teach him about how the GPU works with some code examples.
This operation is not just interviewing people, it’s a kind of knowledge transfer and finding the people who are the best at explaining how the code works and can answer deep technical questions. This is how Elon works generally (if you look at the SpaceX interview, you can see that he just goes to people and asks them questions about the parts of the rocket while he’s doing the interview).
Yea. I think that’s the opposite point that you’re trying to make. That’s what would make them less qualified as reviewers since the domain is very different.
It is about coming up with BS false positives, pointing them out and saying this code is crap.
Of course if someone is professional and understands there was different context and all you have is code and writes down false positives and discusses them it is OK.
Reusing code in unexpected ways seems like what would be predicted in this scenario, if you agree with the position Naur advances in Programming as Theory Building (which I mostly do).
> You can't jump in and "evaluate" without knowing context and being familiar with the features.
Yes you can, it's called an audit and there is nothing wrong with that. The company you work for should have regular security audits for instance, ideally done by a third party rather than internally to eliminate bias. This isn't a "code review".
New people regularly join our company and take a few weeks to become productive at making small changes to one or two modules out of 5,000+ in the enterprise.
About 6 months. That doesn’t mean they don’t instantly have a general idea of the quality of the codebase. The amount of WTF’s decreases significantly after the first 2 months (when they give up and accept that that is just how it is).
Eh, if you're competent, then sure. But some people have obvious code smells. You'd be surprised. A quick glance and it's obvious they're not competent. 6 layers of inheritance. Composition loops everywhere (A is in B, B is in A, A and B are in C, C are in A and b).
I don't think the goal here is code review. They're trying to gauge a few things: 1) are you competent? 2) are you coasting or genuinely contributing? 3) are you actually dedicated to improving the product or more concerned with office politics and inserting your ideology?
A quick interview and a little demonstration of contribution can help assess these things significantly, you don't have to understand the codebase that much to do it.
Of course I first read the documentation to understand a code base, but then just usually jump in to the part that I’m interested in. If it’s not spaghetti code base, it’s not that hard to do that.
In this case, it would be your new boss asking to make a short presentation of your work, and he has trusted software devs who can smell bs a mile away.