Mar 23, 2023

Mar 23, 2023

Language should never be a barrier

Written by:

Sep 1, 2023



This blog was originally posted on Medium.

In India — a country where over 22 official languages and 19,500 languages or dialects are spoken as a mother tongue — language can either be a liberator or impediment.

Those conversant in any of the nation’s more widely-spoken languages have untold opportunities open to them. Those who aren’t, face an uphill struggle in various facets of their life — from education to commerce and even law and justice. Imagine trying to decipher school lessons, which are in either Hindi or English or navigate the legal system, which has judgements in official languages, none of which you comprehend can be challenging.

For many Indians who aren’t literate, the written word may as well be a giant metal gate, keeping them from information that is critical to their upliftment.

Well before the ChatGPT-inspired AI hysteria swept our collective consciousness, entrepreneurs from all over the country were experimenting with AI to tackle linguistic barriers. Their work, either in isolation or collaboration, has resulted in a variety of innovative, high-quality language AI solutions. Each of these holds the potential to supercharge India’s story by democratising access to information for some of the country’s most underserved communities.

Leading the charge to build AI technology capable of providing all manner of information and services to people in the comfort of their preferred language is the Indian government.

Just last year, the Ministry of Electronics and Information Technology, or MeitY, set up the Bhashini project. The programme involves an ambitious data collection effort — where data is collected in each of India’s 22 official languages — in order to create rich data sets on which AI models can be trained. In addition, the Unified Language Contribution API (ULCA) was also developed to standardise the formats for language datasets, models, and benchmarks.

Launched just last year, the Bhashini project has already resulted in the creation of over 380 open source AI models in speech recognition, machine translation, text-to-speech, and optical character recognition across India’s 22 official languages.

The National Payments Corporation of India — the organisation behind projects such as the Unified Payments Interface (UPI) and IMPS — is working with Bhashini to incorporate Indian languages across its products.

Aiding the Bhashini project has been the Nilekani Centre At AI4Bharat. Housed at IIT Madras has made major open source contributions in datasets, models, applications. Some of the key datasets open sourced by AI4Bharat include IndicCorp (Monolingual Data), Samanantar (Parallel Corpora), Dhwani (Audio), IndicSUPERB (Speech Benchmark) and Aksharantar (Transliteration). AI4Bharat is also the major contributor of models to the Bhashini project including IndicTrans (Translation), IndicConformer (ASR), IndicXlit (Transliteration) and IndicTTS (TTS) among others.

The journey of developing applications that leverage the models in reference applications has also begun. Anuvaad is an AI-based open source document translation tool, allowing the translation of documents in Indic languages at scale Today, separate instances of Anuvaad are deployed across the Supreme Court of India, the Supreme Court of Bangladesh, and NCERT DIKSHA project. Chitralekha is a video transcription tool to convert videos from one language to another. It supports input from YouTube, and can translate English subtitles to Indian languages.

Sunbird, an EkStep-created open source set of building blocks that allows for the creation of customised e-learning solutions, also uses AI. One of the most ambitious of these efforts has been Sunbird Saral, which uses AI to digitise physical assessment submissions. This allows for the collection of extensive and valuable learning analytics, which could be transformational for India’s education sector. Already, Saral is being deployed by many Indian states and has helped capture over 300 million of student assessment sheets by teachers with a single click on their phone, thus dramatically reducing efforts to digitise and freeing up teaching time.

Simultaneously, other organisations are starting to leverage language AI to help improve literacy among children at scale including voice as a medium for large-scale assessment of language abilities.

In order to make use and rapidly improve these models, it is important to create a scalable deployment architecture in which models are available to third party applications through a simple API call. Dhruva is an open source framework to allow optimised deployment of language AI models at scale.

Take the Jugalbandi voice bot, for instance. A free and open source platform, it combines the power of AI with the Indian language translation models that have been spawned by the Bhashini mission in conjunction with Large Language Models such as GPT3 to power conversational solutions.

This could be deployed to help Indians converse with any source of information — be it government schemes, legal orders and notices, statutes, etc — in their native languages. This has the potential to empower tens of millions of Indians who were previously dependent on local intermediaries to access and understand such information. Similar systems are being developed in other domains as well such as a bot to aid India’s farmers.

JIVA, or Judges Intelligent Virtual Assistant, is an innovative system that combs through the Constitution and other legal documents to provide responses to legal queries. Conversant both in Indian languages and English, it streamlines the process of legal research, introducing a new level of efficiency to India’s overburdened judicial system.

Language has the ability to be a bridge to social empowerment and inclusion, which a country as vast and diverse as India needs. The development of these tools will have second and third degree effects across categories, such as education, healthcare, law and justice, agriculture and many others that have not been even imagined yet.

Have you come across any homegrown language AI efforts you think deserve their moment in the spotlight? Here is where you can be part of the movement and recommend an organisation we can curate to build this community together.

Contributed by Vivek Raghavan, Chief AI Evangelist at EkStep Foundation

Join the Community

People+ai is a non-profit, housed within the EkStep Foundation. Our work is designed around the belief that technology, especially ai, will cause paradigm shifts that can help India & its people reach their potential.

Join the Community

People+ai is a non-profit, housed within the EkStep Foundation. Our work is designed around the belief that technology, especially ai, will cause paradigm shifts that can help India & its people reach their potential.