That solves the "garbage training data" problem, but it doesn't solve the "it's just a language model" problem.
If you fine tuned ChatGPT on all the sources you mention, you now have a model that produces results that could plausibly have been written by a domain expert on the Linux kernel, but you don't have a domain expert. It will still hallucinate, because that's a fundamental feature of generative AI, it will just hallucinate much more convincingly.
I get what you're saying, but I'm just not convinced that it will continue to be a huge problem. In some sense, if the state of art is where we're at today with language models, then sure. But I think it'll get better -- in part because I'm not sure humans just aren't souped up language models with some weird optimization functions...
If you fine tuned ChatGPT on all the sources you mention, you now have a model that produces results that could plausibly have been written by a domain expert on the Linux kernel, but you don't have a domain expert. It will still hallucinate, because that's a fundamental feature of generative AI, it will just hallucinate much more convincingly.