I feel like both this comment and the parent comment highlight how RL has been going through a cycle of misunderstanding recently from another one of its popularity booms due to being used to train LLMs
While collecting data according to policy is part of RL, 'reductive' is an understatement. It's like saying algebra is all about scalar products. Well yes, 1%