AI: More Welsh data needed to improve accuracy, says business

Tom Burke, Co-founder of Haia
Tom Burke, co-founder of Haia, says a lack of Welsh data means translations and transcriptions are often inaccurate

Tech developers say better cooperation in Wales is needed to ensure artificial intelligence (AI) functions in Welsh.

Chatbot ChatGPT's ability to understand and communicate in Welsh has impressed researchers with some saying the language was "part of the AI revolution".

But they said Welsh language material under copyright needed to be made available to train computer software.

The Welsh government said its strategy would be renewed soon.

One business already using artificial intelligence to provide bilingual services is Anglesey-based Haia.

The online events company uses simultaneous translation software to enable speakers to talk in Welsh or English with translated subtitles.

But its co-founder, Tom Burke, said their product could be improved if more Welsh language data was legally available.

"One of the issues we have is how accurate it is. If you compare with German or Spanish, Welsh is a small data-set," said Mr Burke.

"We'll often find there are inaccuracies in the translation or transcription and the way to improve that is for us to get access to the wealth of data that is actually available for the Welsh language."

Woman using a smart speaker at work
Accessing larger Welsh language data sets could mean being able to use smart speakers in Welsh

Language AI technology works with computerised large language models, which use huge amounts of data such as webpages, books and articles to predict which words and phrases go together.

Welsh language data could also include radio and television programmes.

"If we can get hold of that data, use it to train models, then Welsh language models become more accurate," Mr Burke added.

"That gives us a head start on that technology and enables us to look at other smaller use languages across the world where we can use the lessons we've learnt here in Wales to push the technologies in those markets as well.

"In the long run that will enable new companies to form, enable new innovation and Wales could become a hub for language technologies."

Welsh language chatbot

Researchers at Bangor University's Canolfan Bedwyr launched Macsen, a Welsh language chatbot prototype, eight years ago.

They now run it with ChatGPT, which was developed by OpenAI in the US.

As well as the economic potential, the head of the Language Technologies Unit at Canolfan Bedwyr, Gruffudd Prys, said Welsh language material should be made available in order to make the technology more "suitable for the needs of the Welsh language and of Wales in general".

He said: "One of the things we can do to improve the quality of artificial intelligence is to enable the data that's out there to be available under permissive licences so that the models reflect the reality of Wales and that they're not overly American or international models."

Chat GPT
Tom Burke says some languages have larger data sets which means the translations and transcriptions are more accurate

Tom Burke said access to data needed to happen soon.

"We've already lost 12 months of innovation time and what will happen is eventually we'll just fall behind the curve and the point we can start to utilise it, the rest of the world will already have it," he said.

"We've got this great position, we've got this bilingual country.

"We've got a fantastic university like Bangor working on this technology. We need to do it now so that companies can start using it and get out there."

'Important priority' for Wales

The Welsh government minister with responsibility for the Welsh language, Jeremy Miles, said using AI to develop the Welsh language was "very important".

"It's been an important priority in our Welsh in Technology strategy, which we're about to renew for the next period," Mr Miles said.

"We've spent £2m on this and it remains a really important priority for our next strategy so we'll be able to take all these questions into account then.

"It's really important with technological developments that we make them available in Welsh as well as other languages."

Listen on Sounds banner
Listen on Sounds banner
Listen on Sounds footer
Listen on Sounds footer