波多野结衣办公室双飞_制服 丝袜 综合 日韩 欧美_网站永久看片免费_欧美一级片在线免费观看_免费视频91蜜桃_精产国品一区二区三区_97超碰免费在线观看_欧美做受喷浆在线观看_国产熟妇搡bbbb搡bbbb_麻豆精品国产传媒

Global EditionASIA 中文雙語Fran?ais
Business
Home / Business / Technology

AI's global village opens wider to more voices

Developers look to break from yoke of English language, cater to all groups of people

By Oasis Hu in Hong Kong | China Daily | Updated: 2024-12-06 07:14
Share
Share - WeChat
LU PING/CHINA DAILY

Artificial intelligence engineer Jacky Chan Ho-kit has conflicting feelings about his industry.

While he looks forward to a future where AI reaches its pinnacle — possessing humanlike cognitive capabilities — he is deeply concerned that it will only understand English.

"Given the language status quo, this is highly likely to be a reality rather than just alarmism," he said.

Chan is the chief technology officer at Votee, a Hong Kong-based AI company. He is also a language enthusiast who in his free time follows language bloggers on social media, absorbing their linguistic insights. Through his research, he has learned that many languages are disappearing.

Even though there are around 7,000 languages still in use globally, according to the World Atlas of Languages of UNESCO, only 10 boast more than 200 million speakers. UNESCO has said that a language vanishes every two weeks, with 25 disappearing annually.

In the online realm, the disparity in language usage rates is even more pronounced.

Over the last decade, English content has dominated the internet, accounting for 49.4 percent as of Nov 26 — more than eight times the use of Spanish, the second most prevalent online language at 6 percent, according to a report by W3Techs, a company that conducts global web surveys.

Conversely, the proportion of web pages that use Chinese, the second-most spoken language in the physical world with more than 1.1 billion speakers, has plummeted from 4.3 percent in 2013 to 1.2 percent in 2024.

In the realm of AI, prominent large language models, or LLMs, like Open-AI's ChatGPT4, Google's Gemini, and Anthropic's Claude all use English as their main language.

Mainstream AI language models, particularly those originating in the West, are made for English-speaking audiences, with translations for other languages serving as only a support function, said Cao Jiannong, chair professor in the Department of Computing at Hong Kong Polytechnic University.

Artificial intelligence is a field devoted to developing technologies that can replicate or even surpass human intelligence. Before this vision becomes real, large-scale AI companies will continue to prioritize enhancing AI's intelligence ability, instead of expanding their services to encompass more languages, Cao added.

Chan, CTO at Votee, agreed that the endgame of AI is humanlike intelligence, but questions the consequences if such intelligence can only speak English.

"Wouldn't it be even more unfair to non-English speakers? Wouldn't global cultural diversity be greatly eroded? Wouldn't the gap between the world's rich and poor be wider?" Chan said.

Since last year, Votee, which previously concentrated on automated data collection and analysis, has shifted its focus to developing AI services for lesser-used languages.

This year, it unveiled a Cantonese LLM and is actively pursuing clients in Southeast Asia, Africa, and the Chinese mainland. Future initiatives include the launch of LLMs and other AI services for Javanese in Indonesia, Okinawan in the southern region of Japan, and various Chinese dialects including Shanghainese and Hakka.

"In an increasingly polarized world, we aim to utilize technology to bridge this gap," Chan said.

Data scarcity

The cornerstone of training AI lies in data. A significant hurdle in advancing AI's linguistic prowess is the scarcity of data available in numerous languages, Chan said.

Of about 7,000 languages spoken worldwide, nearly 99 percent are considered low-resource languages, as the data available for computational processing and analysis is limited.

The fact that mainstream AI tools predominantly rely on English corpora, or collection of written text, leads to significant inconvenience when handling other languages, said Ting Paksun, CEO of Votee.

These AI tools often result in inaccuracies and biased content, cultural misunderstandings, business errors, and even legal violations, rendering them unsuitable for use in both casual and formal contexts, Ting said.

On the beneficial side, AI tools hold the potential to streamline operations, boost productivity, and have a direct impact on local economies.

At an investment summit in mid-November in Hong Kong, Daniel Pinto, president of JPMorgan Chase, said that AI contributed approximately $1.3 billion to the group's finances last year, through cost reductions or revenue increases, with projections indicating a rise to $2 billion this year.

Chan warned regions that are unable to leverage AI tools due to language limitations are likely to experience decreased productivity in the future.

To avoid lagging behind European and United States tech giants, governments and major tech firms in some regions have initiated the development of LLMs customized to their linguistic needs, Cao from the Hong Kong Polytechnic University said.

The UAE, for instance, introduced Jais, the highest-quality Arabic AI LLM, in 2023. This year, South Korea's LG Group unveiled Exaone 3, the country's inaugural open-source Korean AI model.

Smaller, nimbler

Many smaller companies around the world are also venturing into the creation of small language models, Cao said.

Asiabots Ltd, a Hong Kong-based artificial intelligence company established in 2017, is one such company.

Chris Shum Chiu-fai, co-founder and CEO of Asiabots, said that the company initially prioritized AI capabilities in Cantonese due to its Hong Kong location. However, over time an increasing number of clients have approached them for AI solutions in various languages.

Their clients encompass government bodies and private enterprises worldwide including from Southeast Asia and Europe. Instead of opting for large language models, they prefer small language models tailored to specific scenarios, such as AI-driven customer service, AI speech recognition technology, and AI text-to-speech tools.

Asiabots' clients include the Hong Kong Special Administrative Region government, which asked them to develop AI tools for translation services between Cantonese and Middle Eastern languages. The request followed this year's Policy Address, which called for attracting more Muslim tourists, and encouraged the city's taxi services to offer information in Arabic for visitors from the Middle East.

In July, a tourism company in Kunigami, Okinawa, Japan, engaged Asiabots to develop an AI tool capable of translating multiple languages, including minor ones such as Vietnamese.

"Japan is preparing to host the World Expo next year. With the anticipated increase in global tourism, many Japanese companies are seeking AI tools, leading to a surge in requests from Japan recently," Shum said.

Specialized needs

Many mainstream AI tools excel at translating between widely spoken languages such as English and Chinese. However, when faced with less common languages, these tools may falter in recognizing speech and converting it into text, resulting in numerous errors.

The primary issue lies in inadequate data for the specific language, Shum said.

In some instances, countries with limited technological infrastructure may find that their online information is predominantly available in English, rather than their native language, as seen in the Philippines and Mongolia.

Some languages have a variety of pronunciations without standardized characters, such as Minnan, a dialect spoken in southern parts of China.

Other languages are fragmented into numerous dialects. In Indonesia, for example, there are more than 300 dialects, which increase the complexity and diversity of the language.

These challenges can be overcome as long as clients have the financial resources to collect the necessary data, Shum said.

Asiabots accumulates data from extensive research and non-infringing open-source repositories, he said. Clients also provide data to the company or fund it to conduct on-site data collection.

After collecting the data, Asiabots collaborates with local universities and recruits native language speakers to refine and localize AI solutions, aligning them with regional cultures and legal frameworks to overcome cultural barriers.

Since its inception, Asiabots has expanded its AI's linguistic repertoire over the past seven years to 22 languages, including Indonesian, Filipino, Portuguese and Hindi, as well as less common dialects.

After establishing language capabilities, the company tailors AI software and hardware to meet specific customer requirements.

For instance, for the Okinawa tourist spot, Asiabots developed an AI translator capable of translating among five languages: Japanese, Chinese, English, Korean and Vietnamese. These languages can also be interchanged with any of the company's 22 language libraries when required, Shum said.

Endangered languages

While commercial demand ensures the survival of languages with a large offline population, those with few speakers, limited commercial interest, and insufficient technological research are at risk of becoming endangered both online and offline, Chan warned.

UNESCO has a classification system for endangered languages. Ones spoken across all age groups and contexts are considered safe, while languages that children no longer learn as their mother tongue are considered endangered. Those spoken solely by grandparents are in extreme peril, and those lacking speakers face extinction.

Based on this definition, even language dialects that are spoken by substantial populations, like Minnan and Hakka, which is primarily used in southern China, face a fight for survival as fewer young people are learning them.

Shum said not preserving an endangered language could lead to a deep sense of regret.

"There are various research directions in AI and we opted to delve into language study from the start, because behind each language lies a unique mode of thought and a profound reservoir of human wisdom," Shum said.

For instance, the Minnan term describing tears as "falling water" reflects a beautiful perspective. Losing such ways of thinking and expression is a loss of culture, and possibly even civilization, Shum said.

Chan said that language is a crucial vessel of intangible cultural heritage, showcasing the history, customs, habits and social relationships of a region, while forming a part of people's individual and collective identity.

"Protecting the cultural value of a language is much more urgent than its commercial worth, yet it often receives inadequate attention," he said.

By preserving the voice and text of a language through a language model, even if the original speakers disappear, people can access its nuances and written form and learn it whenever they want, Chan said.

Money talks

With hundreds of indigenous languages in Africa at risk of extinction, Votee has worked with clients on the continent to assist in language preservation efforts. However, significant challenges stem from Africa's political instability, limited technological proficiency and insufficient technology infrastructure.

In recent years, many clients have asked Asiabots to develop language models for the preservation of endangered languages.

However, all these projects faltered due to a lack of funding for data collection, such as sending researchers into remote mountainous regions to record voices, and process and digitize these recordings, which might cost millions of dollars.

Francis Fong Po-kiu, honorary president of the Hong Kong Information Technology Federation, said that the governments of smaller language communities should recognize the cultural value inherent in these languages.

Chan proposed that global tech firms, language-focused NGOs, linguists and language enthusiasts collaborate to form communities for mutual support and to encourage the contribution of open-source language data.

When developing its Cantonese LLM, Votee collaborated with Cantonese linguists and enthusiasts to establish a Cantonese-centered community. Subsequently, it open-sourced all the data and models within the LLM.

"Cantonese belongs to everyone, not just a select few — it already lacks resources, so why create additional boundaries?" Chan said.

In July this year, SenseTime, an AI software company in Hong Kong, launched a Thai-language LLM.

Lu Lewei, director of the SenseTime Research Institute, said that they paid attention to minor languages because equipping AI with multilingual capabilities is also good for its own improvement.

More importantly, AI was designed to assist humanity, and its future should prioritize broader accessibility and use, and not neglect some groups, Lu said.

"I believe this is the original intent, also the ultimate goal of humanity's pursuit of technological advancement," Lu said.

Top
BACK TO THE TOP
English
Copyright 1995 - . All rights reserved. The content (including but not limited to text, photo, multimedia information, etc) published in this site belongs to China Daily Information Co (CDIC). Without written authorization from CDIC, such content shall not be republished or used in any form. Note: Browsers with 1024*768 or higher resolution are suggested for this site.
License for publishing multimedia online 0108263

Registration Number: 130349
FOLLOW US
CLOSE
 
波多野结衣办公室双飞_制服 丝袜 综合 日韩 欧美_网站永久看片免费_欧美一级片在线免费观看_免费视频91蜜桃_精产国品一区二区三区_97超碰免费在线观看_欧美做受喷浆在线观看_国产熟妇搡bbbb搡bbbb_麻豆精品国产传媒
国产成人自拍网| 四虎精品免费视频| 91精品午夜视频| 伊人色综合久久天天人手人婷| 国产美女在线精品| 精品国产aaa| 国产午夜精品美女毛片视频| 精品一二三四区| 快灬快灬一下爽蜜桃在线观看| 日韩美女在线视频 | 成人av一区二区三区| 99re6热在线精品视频| 国产亚洲欧美日韩日本| 国产久卡久卡久卡久卡视频精品| 五月激情四射婷婷| 欧美高清在线视频| 成人app在线| 在线亚洲免费视频| 亚洲尤物在线视频观看| 欧洲熟妇的性久久久久久| 欧美精品乱人伦久久久久久| 午夜精品一区二区三区免费视频| 亚洲第一黄色网址| 久久亚洲一级片| 粉嫩av亚洲一区二区图片| 国产又爽又黄网站| 亚洲午夜羞羞片| 亚洲观看黄色网| 2024国产精品视频| 丁香桃色午夜亚洲一区二区三区| 91久久精品一区二区二区| 亚洲国产精品综合小说图片区| 粉嫩av懂色av蜜臀av分享| 精品国产凹凸成av人导航| 国产黄色成人av| 色吊一区二区三区| 三级影片在线观看欧美日韩一区二区 | 波多野洁衣一区| 欧美无乱码久久久免费午夜一区| 亚洲国产欧美一区二区三区丁香婷| 内射中出日韩无国产剧情| 久久久www成人免费无遮挡大片| 高清国产午夜精品久久久久久| 在线免费一区三区| 青草av.久久免费一区| chinese全程对白| 亚洲国产成人porn| youjizz亚洲女人| 一区二区在线电影| 成人性生交大免费看| 国产精品国模大尺度视频| 少妇熟女视频一区二区三区| 欧美va在线播放| hitomi一区二区三区精品| 欧美一区二区网站| 国产91精品露脸国语对白| 欧美日韩精品三区| 国产精品中文欧美| 欧美女孩性生活视频| 国产一区二区三区日韩| 欧美亚洲自拍偷拍| 国产一区二区三区在线观看精品| 欧美午夜精品免费| 国产伦精一区二区三区| 欧美日韩精品欧美日韩精品一综合 | 亚洲图片你懂的| 色无极影院亚洲| 亚洲精品伦理在线| 欧美丰满老妇熟乱xxxxyyy| 一区二区激情视频| 蜜桃av免费观看| 日本高清视频一区二区| 轻轻草成人在线| 在线观看国产日韩| 国产精品久久久久久久久免费桃花 | 国产精品久久午夜夜伦鲁鲁| 手机在线成人av| 综合网在线视频| 中文天堂资源在线| 偷偷要91色婷婷| 色噜噜偷拍精品综合在线| 久久se这里有精品| 3751色影院一区二区三区| 波多野结衣亚洲| 国产日韩欧美a| 亚洲欧美视频在线播放| 亚洲一区在线播放| 国产精品久久久精品四季影院| 看电影不卡的网站| 这里只有精品视频在线观看| 99国产精品一区| 国产农村妇女毛片精品久久麻豆| 毛茸茸多毛bbb毛多视频| 亚洲自拍偷拍麻豆| 色94色欧美sute亚洲线路二| 国产精品资源在线看| 欧美xxxx老人做受| 免费黄色三级网站| 夜夜嗨av一区二区三区中文字幕| www欧美com| 国产精品影视在线观看| 精品99999| 好吊日免费视频| 婷婷综合五月天| 欧美精三区欧美精三区| 91视视频在线观看入口直接观看www | 色欲AV无码精品一区二区久久 | 欧美专区日韩专区| 99久久综合精品| 中文字幕一区二区三区精华液 | 超碰人人人人人人人| 老色鬼精品视频在线观看播放| 在线成人免费观看| 亚洲欧美日韩色| 亚洲chinese男男1069| 欧美唯美清纯偷拍| 亚洲精品一二三四| 亚洲综合色成人| 欧美色综合天天久久综合精品| 94-欧美-setu| 一区二区三区色| 欧美日韩在线播| 极品白嫩少妇无套内谢| 亚洲一区二区视频在线| 欧美日韩视频在线第一区| 女同性αv亚洲女同志| 亚洲一区二区欧美激情| 欧美日本一区二区| 三级男人添奶爽爽爽视频| 欧美色图在线观看| 无码人妻一区二区三区免费n鬼沢 久久久无码人妻精品无码 | 欧美国产视频在线| 久久久久久视频| 97精品国产97久久久久久久久久久久| 中文字幕中文字幕在线一区| 色久优优欧美色久优优| 91免费看片在线观看| 亚洲午夜激情av| 欧美一级欧美一级在线播放| 青青草福利视频| 国产在线不卡一区| 国产欧美精品一区| 成人羞羞国产免费图片| 91毛片在线观看| 婷婷久久综合九色综合伊人色| 日韩午夜精品电影| 国产性猛交xx乱| 成人短视频下载| 亚洲电影视频在线| 日韩欧美成人午夜| 青青操在线播放| 91视频一区二区三区| 亚洲成av人影院| 欧美精品一区二区三区高清aⅴ | 青青草国产成人av片免费 | 国产精品一区二区入口九绯色| 久久精品国产99| 日本一区二区成人| 在线一区二区视频| 毛茸茸多毛bbb毛多视频| 国产精品性做久久久久久| 亚洲欧美色图小说| 欧美一区二区精品久久911| 欧美激情视频二区| 97se亚洲国产综合自在线观| 婷婷久久综合九色综合绿巨人| 久久久国产精华| 欧美三级电影精品| 黄色片在线观看免费| 91在线小视频| 日本怡春院一区二区| 中文字幕第一区综合| 欧美日本在线看| 极品蜜桃臀肥臀-x88av| 91麻豆精品一区二区三区| 美女在线一区二区| 亚洲免费伊人电影| 精品成人一区二区三区四区| 一本大道久久a久久综合| 醉酒壮男gay强迫野外xx| 成人sese在线| 蜜桃久久精品一区二区| 1000精品久久久久久久久| 日韩免费电影一区| 日本高清视频一区二区| 午夜在线观看一区| 精品国产一二区| 国产成人在线免费| 视频一区视频二区中文字幕| 欧美精品免费视频| 人人澡人人澡人人看| 中文字幕 亚洲一区| 97精品视频在线观看自产线路二| 麻豆精品一二三| 亚洲伊人色欲综合网| 国产色综合久久| 日韩免费在线观看| 欧美丝袜丝nylons| 欧美精品久久久久久久久46p| 黄色短视频在线观看|