It has been widely speculated that the primary reason OpenAI never disclosed the full training dataset for GPT-3 or GPT-4 was to avoid potential legal backlash.
I prefer to think of it as the Uber/AirBnB model. Just do illegal things so much that you clog the enforcement mechanisms. Then it becomes such an unreasonable burden that they change the laws in your favor.