Train a ML model as a classified document classifier and scan everything through that... now they just need to dig through their archives for the training set from all the past leaks.
I mean I was partially joking about the origin of the dataset, but they likely could work with DoD at this point to get a model that is acceptable to put in place, after as many leaks as they have had.
I believe they already have things track how documents are accessed and copied across their network. There’s definitely flaws in it, but it’s not completely open.