The 120-languages opportunity for building Indian AI solutions
This blog was originally posted on Medium.
In a survey of emergency department clinicians in India, published in 2021, 64% of respondents reported that information was either lost or changed when they had to explain English medical knowledge in a different language. In 2018, civil society and political groups in the country were pressing banks to add services in languages besides English and Hindi, as customers who converse in other Indian languages struggled when they had to block a stolen card or understand the terms and conditions of their loan.
For Indians, who use at least 22 major languages, 13 distinct scripts, and 720 dialects, the language barrier is real. Several government initiatives and startups in the country have been working to crack this problem for the benefit of a significant population that cannot access services in the English language. This is also being recognised as one of the biggest opportunities that must be prioritised while innovating with emerging technologies.
Most recent and disruptive among these technologies is generative AI. In fact, the communication gaps driven by language diversity have highlighted the potential for building an Indian GPT. While ChatGPT already contains a wealth of information that would be useful to Indians, it cannot always contextualise information for India because Indian language patterns and cultural behaviours are so diverse.
Take the example of a cursory check performed by Mitesh Khapra, associate professor of Computer Science and Engineering at IIT Madras. He first searched the OpenAI chatbot for details on what a ‘roka ceremony’ is and what would be appropriate attire to wear to one. ChatGPT was able to offer useful suggestions. So far so good.
“The problem of languages is now becoming much more tractable to solve. Traditionally, what AI products used to do was take one language at a time and build automatic speech recognition, text processing and so on. Increasingly, you have this ability to build a single model that is able to handle all different languages.” — Manish Gupta, Director at Google Research India
Next, he typed to ask: Do you know the ‘kande pohe’ ritual in Maharashtrian marriages? In the state of Maharashtra, asking someone if they had a kande pohe marriage is code for finding out whether theirs was an arranged marriage. This cultural and linguistic nuance — based on the fact that the dish is often served in the first meeting arranged by families of a prospective bride and groom — had eluded ChatGPT.
Khapra has been studying the scope for LLM solutions for India. Speaking at the People + AI Campsite, he notes the questions we need to ask as this technology develops are about equitable access. “First, is it true that the LLMs we have today work well — or work much better — for the English speaking populations of the world? And if that is the case, as a country how should we react to that?”
In other words, how should we make LLMs work for Indian languages? And, should we build an Indian GPT?
The big opportunity
India is home to close to 400 languages. Roughly 120 languages have at least 10,000 speakers. That means data should be available in these languages for generative AI to be trained for effective use in India. Here’s a sense of the current gap. There are 6.6 million Wikipedia entries in English. For Hindi, that number is 260,000, and for Odia, it is about 10,000. To put that in perspective, there are 528 million Hindi speakers and Odia has more than 30,000 speakers.
One of the ways to plug the holes in knowledge transfer is by integrating language solutions with existing LLMs. This will involve improving speech recognition systems and machine translation systems for Indian languages. In the international scenario, AI improvements have already boosted language solutions. For instance, Google used artificial intelligence to transform its popular Google Translate service.
Various efforts in this direction are going on in India too. You have Project Vaani, for which AI and Robotics Technology Park (ARTPARK) and the Indian Institute of Science (IISc) partnered with Google India to collect speech sets of a million Indians. Additionally, Google’s Indian research unit launched MuRIL, or Multilingual Representations for Indian Languages, a free and open-source machine learning tool which allows for training and transfer of knowledge from one Indian language to another.
Then there is Bhashini, India’s AI-led language translation system, part of the National Language Translation Mission. It has made open source datasets available for the benefit of enterprises, startups and individuals.
It is clear that speech and voice will be important pieces for solving the language puzzle in India, where people are inadequately equipped to read and write. Voice is, after all, considered the most natural form of communication.
Voice based transactional AI
Not just LLMs, innovations are emerging in another AI technology — voice-based transactional AI — to make information accessible in Indian languages and empowering people who do not speak English.
Abhigyan Raman, a member of the AI4Bharat initiative at IIT Madras, is part of a team working on self-supervised speech recognition models for Indian languages. In a demonstration, he shows how the same sample sentence spoken in Hindi is swiftly converted to Gujarati audio. Such translations among Indian languages — Tamil to Marathi, or Punjabi to Telugu, etc. — would facilitate communication and exchange of ideas.
Where earlier forms of ASR often failed to capture heavy accents and colloquial speech, Raman says his transactional voice AI is designed to understand and follow commands in naturally spoken Indian languages.
“Instead of the application bossing us around, we get to boss it around using speech,” says Raman. To explain the model, he says speech is the input and speech is the output, everything else in between is AI. “What happens in the middle is a combination of voice recognition and translation technology that helps convert speech into text and text back into speech.”
Raman and team have found a way to achieve ASR in-domain on apps, which works by simply decoding text-based transcripts of speech. This dramatically cuts the amount of data and time usually required for training a model. Not only can it recognise speech accurately, but it also tags named entities within the speech. Tagging named entities accurately will enable API integration between this model and businesses. Various applications are currently being explored including integration with UPI for payments and IRCTC for bookings.
If this can be enabled in all languages, speaking styles, and devices, it will have a large-scale impact touching much of the country.
Unlocking Indian data
For these solutions to work, as one attendee at the People + AI Campsite pointed out, you need not just researchers but a network of data creators, data gatherers, data annotators, and data cleaners. “Because data is not oil anymore, it is water.”
At that gathering of AI specialists, technologists, educators and lawyers, many suggestions were put forward for ways to bring together data. Whatever is available with the government should be digitised and made available, says Khapra. Or, with a resourcefulness typical of India, data from films, radio, newsreels, etc. can be transcribed and used. Further, mobile phone users can be incentivised to contribute to data gathering efforts.
“GPT models are great as reasoning AI but we need to augment that with language AI. We have a responsibility of unlocking our (Indic) language data for open source or closed source models to do well on those languages.” — Pratyush Kumar, co-founder of AI4Bharat.
“There could be an incentive set up where I can ask someone to talk about the kande pohe ritual, for example,” Khapra says. For instance, data will allow you to consider the availability of drugs when applying AI in healthcare, or the question of motivation while applying AI in the wellness space.
Such exercises will not only break barriers in Indian communication, they will also help build indigenous solutions that have a nuanced understanding of the Indian context. “The invention of the iPhone was a pivotal moment when touch developed as an interface,” Raman observes. “What we are seeing today is another pivotal moment when language is developing as an interface.”
Join the Community
People+ai is a non-profit, housed within the EkStep Foundation. Our work is designed around the belief that technology, especially ai, will cause paradigm shifts that can help India & its people reach their potential.