Keynote: Lithuanian language and artificial intelligence: current situation and future prospects
In the second and third decades of the 21st century, language technology underwent a true revolution, with machine learning technology evolving into technologies that can converse with humans, generate text, translate, solve complex tasks, and help people in their professional and everyday activities. Today, at the heart of all these developments is neural network-based deep machine learning, which makes it possible to build powerful cognitive platforms – large-scale language models. The ChatGPT chatbot launched by OpenAI at the end of 2022 showed the world the vast potential of this technology. And not surprisingly, it has generated a huge amount of interest, both in the public, academic and business worlds: soon, even larger and more capable speech models were being developed, and investment in the field has increased drastically. Many countries have started to develop their own national Large Language Models and, in addition to the big countries, Bulgaria, Finland, Greece, Hungary, Poland, Norway, Slovenia and others are already developing Large Language Models. In my talk, I will explain what large language models are, the stages of their development, and the challenges that lie ahead for developers of these models. It is known that the disordered development of large language models brings with it negative consequences and risks, so I will also touch on these topics in my talk.
What about the Lithuanian language? We still do not have an open large language model for Lithuanian. However, a major breakthrough is expected in the near future, as in Lithuania, as elsewhere, there is a growing interest by the state, science and business in these technologies, as well as in the creation of preconditions for the emergence of a large Lithuanian language model: we observe increased investment in the collection of Lithuanian language data and more active participation in international initiatives and projects. In my talk I will try to answer the question of why these technologies are so important for small, low-resource languages such as Lithuanian. I will argue why society, government and business should be interested in developing these technologies.
About the speaker:
Andrius Utka is an associate professor at the department of Lithuanian studies and a senior researcher at the Institute of Digital Resources and Interdisciplinary Research (SITTI), Vytautas Magnus university (Kaunas). He defended the doctoral dissertation Statistical Identification of Text Functions in 2004 (VMU, Kaunas). He was the head of Centre of Computational Linguistics in 2010-2022. He has coordinated a number of national and international research projects. A. Utka is currently the head of Language use research, resources and technology group at SITTI. His research interests: statistical text analysis, language resources, machine learning, computer-assisted translation, terminology extraction, and the language of disinformation.