The benefits of Transfer Learning for the European steel industry

Transfer learning is a widely used technique in machine learning and artificial intelligence, and it’s not surprising. In today’s article, we will discover how transfer learning works, what makes it so successful, and see an example of where the technique is applied today. Of course, we will then see how transfer learning will be deployed in the ALCHIMIA project.

Transfer learning, as the name implies, involves leveraging the knowledge a machine learning model has gained in one domain (the “source domain”) by transferring it to another similar yet distinct domain (the “target domain”). There are various ways to achieve this transfer (such as feature extraction or finetuning) but the motivating idea behind this technique is that it’s better to start with some sort of foundation than to start from scratch.

To make this concrete, an example of the result of transfer learning that almost everyone will be familiar with is ChatGPT. Most large language models (LLMs), of which ChatGPT is just one example, leverage transfer learning at some point in their training. In the case of ChatGPT, the model was first trained on articles from Wikipedia, the well-known online encyclopedia. There is clearly no shortage of data to train an LLM, and this Wikipedia data gives the model a knowledge of grammar and sentence structure, and the ability to string sentences together into coherent paragraphs.

ChatGPT wouldn’t have been so successful if its creators had stopped there, however. What makes the chat bot so compelling is that, well, it’s a chat bot and not just an encyclopedia. To achieve this, the creators finetuned the model on specific question-and-answer data, tailored to give ChatGPT its enthusiastic tone of voice and ready-to-help attitude. Understandably, this specific dataset was much smaller than the data available on all of Wikipedia, and we would never have the ChatGPT we do now had it been trained on questions and answers alone. Transfer learning helped the model maintain the knowledge it gained from Wikipedia entries while tailoring its responses to resemble an actual conversation. It’s thanks to transfer learning that ChatGPT can be both incredibly knowledgeable and a helpful assistant.

From this example, we can see some of the characteristics that make transfer learning the ideal solution in certain scenarios. First, the two domains – source and target – must be similar, otherwise the knowledge gained in one would not be useful in the other. Secondly, the data available in the target domain is generally insufficient to train a truly powerful or reliable model. This could be because the data is expensive to obtain (such as ChatGPT’s question-and-answer dataset, which must be manually created), or is simply limited. The foundation provided by the data in the source domain gives the machine learning model the leg up it needs to perform well in the target domain.

We can now turn our attention to the ALCHIMIA project. Transfer learning will be used to finetune the  “global model” that results from federated learning, a technique also applied in the project (to learn more about federated learning and how it will be used in the ALCHIMIA project, please see our earlier post). It’s a win-win situation: each steel plant can benefit from a model finetuned to their particular circumstances and data, while benefitting from the learnings from other steel plants in the consortium. The result is a machine learning model more useful and powerful than what each steel plant could achieve by working with only its own data. And that is the power of transfer learning.