GPT-SW3 Pre-release

Written by Daniel Gillblad | Jan 23, 2023 10:53:00 AM

Evaluating large, generative language models for Swedish and next-generation AI applications

Since the publication of the Transformer architecture and starting with the GPT and BERT models in 2018, there has been a revolution in the way we model and build applications for natural language. Today, large generative language models such as variants of GPT-3 and ChatGPT show capabilities that were almost inconceivable just a few years ago. There has been an explosion of architectures and models, some publicly available and some not, but it is clear that these types of models will be of fundamental importance for AI and that models capable of representing the Swedish language will be of fundamental importance for AI applications in Sweden.

GPT-SW3 — A generative language model for Swedish

As we have discussed in previous posts, the GPT-SW3 initiative develops a large GPT-model for Swedish. It is driven by AI Sweden (the National Center for Applied AI in Sweden), together with RISE (the Research Institutes of Sweden) and the WASP WARA for Media and Language; these are all organizations with a main focus on promoting open research and use of AI. Our motivation for developing GPT-SW3 is to create a foundational resource for Swedish NLP that represents the entire Swedish language, and by extension, the entire Swedish population. We aim to do this as openly, transparently, and collaboratively as possible, with the goal of making the model available to all sectors in Sweden that may have a need for NLP solutions. (Further reading: “Why do we need a large GPT for Swedish?”

The GPT-SW3 pre-release

While we want GPT-SW3 and similar models to be a foundational resource for everyone developing AI applications or doing research within AI, sharing such models comes with a number of challenges. Although they represent tremendous potential value, the models could possibly be used for adversarial purposes, e.g. to generate hateful, offensive, or misleading text; they may contain sensitive information and biases from the text they have been trained on; and they will require quite substantial compute resources limiting the type of users that can actually use them. However, the only way to realize the value of these models and to better understand how to mitigate these issues is to perform continued research and experimentation on real-world problems, share the learnings, and use these to continue to develop models and applications responsibly.

Therefore, we are now taking the first step in sharing the GPT-SW3 models. In this pre-release, we will share the models with those who commit to not using the models in ways that may cause harm, and who are committed to sharing the learnings of their research and applications with AI Sweden and the broader community.

If you are interested in accessing and developing these models with us, don’t hesitate to apply for access.

View full post