It generally happens because you don’t follow a defined state machine. An example of how this might happen is when starting the call the microphone isn’t opened. Then you add someone else to the Group FaceTime, the event handler for they didn’t stop to consider if the call is active (just assumed it is) and now the code for that handler opens a new port to the microphone so that it can encrypt the audio stream differently for that recipient.
Super easy and not remotely malicious. It’s a failed state check.
The actual bug here might be different but that’s an easy example. But it may also effectively be the bug since all the examples mention adding yourself to the call.
I did a bit of WebRTC development for video chat, and the state machine for that is one of the most complicated I've ever dealt with. Even household name-brand commercial providers don't handle all of the edge cases. It took me about a week to get it right. Session negotiation gone wrong can easily cause audio to be heard before the call is established (and this will certainly happen with the naive implementation -- even Google's own reference implementation had issues).
If you're curious, check out this flowchart slide from a Google I/O WebRTC talk:
Super easy and not remotely malicious. It’s a failed state check.
The actual bug here might be different but that’s an easy example. But it may also effectively be the bug since all the examples mention adding yourself to the call.