WeChat s Dialect Collection Initiative Raises Controversy and Privacy Concerns

Combined image of WeChat and DeepSeek logos (AI-generated image)

[People News] China's social media platform WeChat has recently launched a "Dialect Collection" cashback program, inviting users to record voice samples of various dialects (i.e., voiceprints) in exchange for cash rewards. Following the launch of the initiative, some participants reported earning hundreds of yuan. As participation grew, discussions began to focus on personal privacy issues, as dialects have traditionally been seen by Chinese netizens as a relatively private form of communication, leading to concerns about their security and identifiability.

As reported by Jimu News on April 10, WeChat has been inviting select users to participate in the "Dialect Collection" task. Participants are prompted to read everyday phrases, and upon completing their voice recordings, they can receive cash rewards. Some users have shared screenshots of their earnings on social media, claiming a daily income of around 40 yuan.

Huang Yiming, an engineer specialising in voice recognition research in Zhejiang, stated in an interview with Radio Free Asia that there are hundreds, if not more, dialects in China, along with various dialect variants. He noted that it is often said, "Ten miles differ in pronunciation, a hundred miles differ in customs." When broken down to the level of counties, towns, and villages, it becomes even more challenging to quantify: "Dialect voice data has always been relatively scarce, with significant variations. For instance, the Wenzhou dialect is complex, and many people from other regions cannot understand it, making annotation quite difficult. By recording users' voices to supplement this data, they aim to improve the model's recognition capabilities in complex voice environments. The primary goal of collecting such data is to develop voice models."

The paid collection of dialect models has sparked a heated debate.

Users can input designated text and voice on the WeChat platform, and after passing the review, rewards will be credited to their WeChat Wallet within 30 days. It is reported that users can earn approximately 1 yuan for every 3 sentences recorded, and 5 yuan for recording 20 sentences, with the actual daily recording volume typically ranging from 100 to 200 sentences.

Reports indicate that among the over 130 languages and various dialects in China, 68 have fewer than 10,000 speakers, 48 have fewer than 5,000 speakers, and 25 have fewer than 1,000 speakers.

Regarding the WeChat platform's decision to spend money on collecting dialect voice data, Huang Yiming stated that the platform needs to enhance the accuracy of voice recognition for internet users: "As for its purpose, I believe everyone understands. When using voice chat on WeChat, there are tools for recognition, but without a model for that dialect, it cannot be resolved, or it becomes quite challenging."

As the initiative expanded, discussions began to shift from the technology itself to the potential applications of this voice data. For a long time, dialects have been viewed as a relatively obscure form of expression in informal communications, and the difficulty in recognition has somewhat limited the possibility of automated processing. WeChat has indicated that this project aims to "enhance the voice recognition experience."

Netizens express concerns about the potential 'misuse' of dialect recognition.

"Even my hometown dialect is no longer safe," commented one user, which garnered significant responses on social media. Below this comment, several users noted that one reason for using dialects in the past was to minimise the chances of being recognised by the system.

Mr Qi, a netizen from Tengzhou, Shandong, informed reporters that the local area is home to a variety of dialects: 'Tengzhou is a small place, and people speak differently depending on whether they are in the east, west, south, or north. If you try to use voice input, it cannot be recognised on WeChat. I believe they are currently collecting voice data, which will certainly be very helpful for voice recognition monitoring.'

Scholar Yu Wentian, who specialises in personal privacy issues, argues that the technology itself is not problematic; the crucial factor is its intended use. He stated to reporters: 'If the technology is used to convert dialects into text, that is meaningful and should be supported. However, if it is employed to review dialect content and interfere with critical speech, that would not be beneficial for most netizens.'

In recent years, the Chinese Communist Party has gradually implemented voiceprint recognition technology in sectors such as finance and telecommunications for identity verification and risk management. A voiceprint is a type of biometric characteristic; in simple terms, it is the 'fingerprint' of a human voice, which can be used to identify the speaker. Some platforms in China already possess capabilities for voice transcription and content review. Experts in voice technology research indicate that as artificial intelligence model training progresses, the significance of voice data in various applications is on the rise.

So far, WeChat has not provided additional details regarding data management. The initiative is still in the invitation phase and has not yet been fully opened to the public. (Adapted from Radio Free Asia)