It's not open source, we have no idea about the data used to train the model, an...

pedalpete · on Jan 29, 2025

Is this an important consideration in open sourcing an AI model?

I would think the code to build your own is open sourced, and you can feed it any data you'd like. That's the open source part, not the part where they are running the model.

Have I misunderstood this?

kelipso · on Jan 29, 2025

It’s a common complaint on open sourced ML models that they don’t provide or describe the data used to train the model. Sometimes it’s a valid complaint, since it may not be clear what kind of data was used to train the model, and sometimes it’s not since it’s clear.

I think it’s kind of an overdone complaint and I usually ignore it, and besides it looks like there’s a huggingface project ongoing where they’re trying to replicate the training process for this model anyway.