Semi-related curiosity question: When viewing twitch chat as a logged-out user, if someone enters a banned word, it shows me "<message deleted>." However when there are dozens of messages per second like the bob ross chat, the message deleting seems delayed, because you can sometimes catch a glimpse of the bad words before they are "removed". I assume the removal is done with javascript? Are you seeing the non-censored version of the chat when scraping?
Messages are sent through plainly first over Twitch's IRC-ish protocol. You can connect to Twitch chat with a simple IRC client, or through the browser. Twitch's own browser chat connects to a WebSocket server that passes IRC commands to the browser, where they are parsed & executed.
Twitch has a few additional IRC-ish commands like CLEARCHAT, which deletes messages by a given user. Most IRC clients don't support this, but Twitch's browser client of course does :) In larger streams spammy messages are usually removed by bots like http://www.nightbot.tv/ or http://twitch.moobot.tv/. That's where the delay comes from: messages have to arrive at the moderator first.
Interestingly, CLEARCHAT can only delete _all_ messages by a given user (as far as I know), so non-offending messages are also removed. This is done by the client, the chat servers only pass "CLEARCHAT #channel_name user_name".
(e; it's all messages by a given user, or all messages in the channel if no username is given)
In the Twitch browser client, you can double-click the <message deleted> text to show the original text if you really want to! :)
I'm not 100% sure how they're doing it. It seems to be both bad words and spammy stuff (e.g. a line of just "KappaRoss KappaRoss KappaRoss..." a hundred times over will get removed too).
The IRC gateway passes through everything so I'm seeing it all unfiltered (as far as I can tell).
I believe the message removal is done by a moderator. This may be a human being or a bot. In either case the message has to appear in the chat before either party can decide whether to leave or remove it.
Twitch's chat protocol is compatible with IRC, but IRC doesn't support all new features, so things like replacing banned words with <Message Deleted> won't show up.
In the desktop version of twitch chat the deleted lines are crossed but visible. So I guess it's just some kind of markup which marks a line as deleted.