Bringing Javanese and Sundanese into the Digital Voice Ecosystem

Millions of people speak Javanese and Sundanese across Indonesia, yet these languages remain underrepresented in global voice technologies. As speech-based systems continue to grow, ensuring that diverse languages are included has become an increasingly important challenge.

To contribute to this effort, SOI Asia recently concluded a three-month collaboration supporting the inclusion of Javanese and Sundanese on Mozilla’s Common Voice platform — an open initiative that collects voice data to help make speech technologies more inclusive and accessible.

The team of coordinators, fellows and mentors: (from the top left) Achmad Husni Thamrin (SOI Asia),Marcos Sadao Maekawa (APNIC Foundation), Kirana Ajeng Pratiwi Nurdin (ITB), Heidi Schan Andriana (ITB), Abdur Rohman Muhammad (UB), Gilang Ramadhan (UB), Ratno Wahyu Widyanto (UB), Eueung Mulyana(ITB), Achmad ‘Abazh’ Basuki (UB).

This collaboration was an initiative brought forward by the APNIC Foundation and coordinated through SOI Asia, engaging students and faculty members from partner universities in a shared effort that combined language, technology, and community contribution.

The main outcomes of the collaboration were the localization of the Common Voice website and the preparation of more than 300 prompts for spontaneous speech recordings, enabling the platform to accept contributions in both Javanese and Sundanese. As a result, both languages were successfully opened for contributions on Common Voice in early February.

Beyond the technical outcomes, the experience also carried a strong personal dimension. During the wrap-up session, several participants reflected on how the process allowed them to reconnect with their linguistic roots — revisiting everyday expressions, involving family members in discussions, and rediscovering the cultural depth of their own languages.

Spontaneous Speech interface in Sundanese.

For SOI Asia, this collaboration also highlights the important role that technology can play in supporting language preservation. By contributing to an open dataset like Common Voice, participants are helping ensure that underrepresented languages are not left behind as speech technologies continue to evolve.

At the same time, the initiative reflects SOI Asia’s broader approach of creating opportunities for universities and communities to engage with real-world digital ecosystems through collaboration. In this case, the contribution extends beyond technical work, bringing together cultural knowledge, local context, and collective effort.

While the formal collaboration has concluded, the platform remains open, and continued contributions will be essential to further grow these language datasets. SOI Asia also looks forward to sharing this experience with the wider community in upcoming activities.

Category: ,