The blog post (https://ai.meta.com/blog/segment-anything-2/) mentions tracking as a use case. Similar objects is known to be challenging and they mention it in the Limitations section. In that video, I only used one frame, but in some other tests even when I prompted in several frames as recommended, it didn't really work, still.
Yeah, it's a reasonable expectation since the blog highlights it. Just figure it's worth calling out that SOTA trackers are able to deal with object disappearance well enough that when used with this it would handle things. I'd venture to say that most people doing any kind of tracking aren't relying on their segmentation process.
Iām not sure what you are looking for a reference to exactly, but segmentation as a preprocessing step for tracking has been one of, if not the primary, most typical workflow for decades.