Continual Learning in Practice

In a previous article, we discussed the benefits of continual learning for the European steel industry. We motivated the need for continual learning to help machine learning models keep up with changes in the real world and maintain an acceptable level of performance over time. However, we did not describe what continual learning looks like in practice. In this article, we’ll look at two preliminary questions that should be answered to better understand how our model will behave in production: “How much data does our model require?”, and “How long does it take for our model’s performance to decline?”. Continual learning is not a fixed science, so answering these questions does not guarantee a well-performing model. But they will set us on the right track and decrease the chances your model suffers from the effects of data drift in the future.

The first question to answer is how much data our model requires to perform well; specifically, we want to understand whether our model has reached its capacity to learn, or if there is still room for improvement. This can be evaluated at the training phase by splitting the data into several parts, depending on the size of our dataset. We then train the model on larger and larger subsets of the complete dataset, testing it on the same test set and evaluating its performance. If we have enough data, at some point we should observe a plateau – this particular model architecture has extracted all the information it can from the dataset. We now have a sense of what a meaningful amount of data is for our model; the amount required to maximise its performance. If the model does not plateau, however, then there is no point considering continual learning techniques just yet. We will inevitably have to retrain our model when more data arrives to eke out the maximum performance our model is capable of.

Once we have a sense of how much data our model needs, we can begin to evaluate how long it takes before the performance of the model starts to decay. To do this, we sort our dataset in the order the data arrived and split it once again. We use the older subset of data to train a model, which should be as large as the meaningful data size we discovered in step one. Then we split the other, more recent subset of data into meaningful intervals, depending on our use case (e.g., days, weeks, or months). We then evaluate the performance of our model on each of these subsets – which are further and further in the future with respect to the data our model was trained on – and plot the results. This plot will give us an idea of how the model will perform in production as time goes by. At some point, we may notice that the performance of the model starts to decline. This may happen quickly or slowly depending on the domain and the use case, and gives us an idea of how long we can expect our model to last before we observe a degradation in its performance.

Answering these questions provides us with valuable information about our model and how we can expect it to perform going forward. It’s important to keep in mind that these results will vary; our model may decay faster or slower than expected and we may require more or less data depending on how “interesting” the past period has been. This is to be expected, and only increases our understanding of the model’s behaviour. There are many more experiments like the two described above that we might carry out to ensure your model maintains its performance, such as understanding how often to retrain the model and whether to keep old data when new data arrives. We strongly recommend reading this article from Evidently AI, which inspired this blog post: https://www.evidentlyai.com/blog/retrain-or-not-retrain