[Not him of course] — My understanding is that instead of generating a ton of (LLVM) IR, you generate a little (MLIR) IR, because MLIR lets you define operations at higher levels of abstraction, suited to particular tasks. For instance, if you're doing loop optimizations, instead of crawling through a sea of compare and branch and arithmetic operations, you'd just use a higher-level ‘for’ operation¹. Only after you've done everything you can at the high level do you move down to a more concrete representation, so you hope to end up with both less LLVM IR and less work to do on it.
¹ e.g. https://mlir.llvm.org/docs/Dialects/Affine/#affinefor-mliraf...