Convex sums of different models are also supported, as long as the models share the same vocabulary/tokenizers. As soon as models have different tokenizers, we cannot sum the outputs of the models unless we find some way to combine the vocabularies of both tokenizers. Applications of these convex sums could be (1) model ensembling and (2) Contrastive Decoding (https://arxiv.org/abs/2309.09117) which uses a formula very similar to TopPTopK(M_large, top_p=0.9) - 0.5 * M_small.