I haven't read the text, but data drift refers to how, after deploying a machine...

rramadass · on Sept 18, 2024

Right. Can you share how such issues are handled in the ML pipeline?

tomtom1337 · on Sept 19, 2024

The most common solution is to frequently retrain on the latest data. A forecasting model might retrain every week, including the last weeks data, and might even drop older data, for instance training data older than a year.

It's best to transform your target variables, like "number of orders", to "number of orders per customer per day" or something like that. And then in your pipeline, you feed the latest estimate on your number of customers (e.g. average of the last two weeks). That's way more robust over time.

rramadass · on Sept 19, 2024

Makes sense. We need to continuously monitor the performance of the model deployed in the field with our preexisting statistical knowledge of the data and then accordingly schedule regular "model updates".