Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The team value function is just a hyperparameter that describes how greedy the individual agents are. At the start of training the team spirit is 0 and the bots are only rewarded for their own actions. This encourages them to learn basic micro skills, like last hitting. As training progresses the team spirit is increased. When it finally reaches 1, the bots value a reward for a teammate as highly as a reward for themselves.

The actual source of the "communication" is not the team spirit parameter, but the basic fact that the bots have been trained together and they receive the same inputs when making decisions. Unlike humans, who have a limited focus to their attention, the bots can look at the whole map at once. They don't need to communicate because the already "know" what their allies will do when given the same input.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: