I am a lay person. To me, I understand a trained model describes transitions from one symbol to the next with probabilities between nodes. There is a structure to this graph — after all if there weren’t then training would be impossible — but this structure is as if it is all written on one sheet of paper with the definitions of each node all inked on top of each other in differed colors.
This research (and it’s parent and sibling papers, from the LW article) seem to be about picking out those colored graph components from the floating point soup?
This research (and it’s parent and sibling papers, from the LW article) seem to be about picking out those colored graph components from the floating point soup?