A colleague of mine has written up a quick explainer on the key features (https://encord.com/blog/segment-anything-model-2-sam-2/). The memory attention module for keeping track of objects throughout a video is very clever - one of the trickiest problems to solve, alongside occlusion. We've spent so much time trying to fix these issues in our CV projects, now it looks like Meta has done the work for us :-)