Reimagining Indian education system with AI
This article was originally posted on Medium.
An overhaul of India’s education system has long been in the making. Attempts have been made sporadically by using technology to change parts of teaching and learning, but nothing has worked to bring in an ecosystem level change.
Early experiments with generative AI and large language models show signs of success, and the opportunity to constructively use it in education is huge.
Policy makers and educators need to ensure that AI does not take away from children’s ability to reason and think.
AI in education will have to be complementary to current teaching and learning methods and will require a community approach to find the solutions that work for all stakeholders
The room was packed. Sitting next to each other, elbows knocking, the participants may have been jostling for room to lean back, but there was space to express their ideas. It took 20 minutes to just go around the room so that the introductions were in order. Everyone had gathered to discuss how AI could bridge the gap between India’s current education system.
The first People+AI campsite, put together by EkStep on April 1, was the reason for the education-focused group in the packed room. There were students, educators, doers, dreamers, and builders. They all had come together to find answers because in the age of ChatGPT, the village to help raise our children will need to look significantly different.
“Generative AI has democratised everything, and it has also disrupted everything,” says Vivek Raghavan, Chief AI Evangelist at EkStep Foundation. What it means is, he adds, “how we learn will change, how people teach will change, what a school is will change.”
The tryst between AI and education has been in the works for a while, essentially because the story of Indian education is complex. It is neither simple nor linear. Among India’s 1.4 billion people, nearly 521 million are school-going students with 9.5 million teachers. The dropout rates in the year 2021–22 — the latest data set available — have increased among young children. The reasons, over the years, have varied from not having enough resources, helping the family make a living or with household work, or simply “not interested in studies”.
Even if a student finds subjects they want to study, the language of instruction creates another challenge. There are 35 scripts, and as many as 160 spoken languages in the country. The three-language formula first introduced in 1968, said that Indian students will learn English, Hindi, and one regional language. The idea, initially aimed at making students learn more languages, quickly became a political issue. There is also a distinction between language and scripts, which means there could be a distinct language spoken as the mother tongue in certain regions but does not have a script.
A mismatch in the student-to-teacher ratio and the sheer number of languages and dialects spoken and understood in the various regions of the country make access to quality education a complex puzzle.
Higher education throws further wrinkles. The total number of students who enrolled for higher education in 2020–21 was 41.4 million. The number of colleges: 43,796, and standalone institutions: 11,296. The demand and supply mismatch is perhaps the most obvious inference of these statistics. There is also the issue of private higher education establishments being unaffordable for a vast majority of Indians. Overall, learning in India is largely theoretical, which makes graduates unemployable.
“We want AI to be a tool of improving human productivity, so that in a society where there’s a shortage of teachers, more high quality education reaches millions of children… And everybody uses technology in their own language to get every imaginable service they want, so that nobody is left behind.” Nandan Nilekani
Despite these challenges, the Indian education market is enormous. It is expected to be worth $225 billion by the financial year 2025. The education technology (edtech) market is expected to be worth $10.4 billion by 2025.
Over the past few years, there have been attempts to make education more accessible, uniform and affordable for Indian students. It began with language tech development becoming a viable option for the industry. Rural users, who are mostly native or local language speakers, have led the growth of India’s internet user base. Over the past five years, Indian and global tech firms have taken a crack at building language systems. Language models, however, have now shown use cases and the advent of the hugely popular ChatGPT, will be the muse around which large solutions will be built.
Evolution in progress
Back to the room, the group set up a baseline for all that they wanted to achieve in not just their time together but also to create larger solutions.
How would education be reimagined in this world where intelligence is at your fingertips?
Education is not about automation. Children should be allowed to think and be creative. How do we make room for nurturing their creativity?
How do we bring educators and industry together?
We need boundaries about how much time our children should spend on screens. WhatsApp learning didn’t succeed beyond the lockdowns for this reason.
This baseline would be the seed that would eventually grow into a forest of ideas and use cases.
Let’s leave this room for a second again. To answer some of the questions the baseline raises, it would help to look at some of the work that has already been done towards building the digital public infrastructure for education. A key industry player in building language tech in India is Google.
In 2020, Google Research in India built Multilingual Representations for Indian Languages (MuRIL), a machine learning-based model to help people build local language technologies. It supported 16 Indian languages, the highest at the time among publicly available ML-based language tech systems. The team behind MuRIL is now working on Morni, a project that will support 100 Indian languages instead of just 16.
“Our journey started with MuRIL, and now we’re building Vaani together with the Indian Institute of Science, where we are collecting speech data from every part of India, each of the 773 districts,” said Manish Gupta, Director at Google Research India. “We’ll put all of that data back into open source so people across the board can make use of this data,” he added.
Then there is the work being done by the government. The Ministry of Electronics and Information Technology has a project called Bhashini. It is a platform trying to build a solution which will enable information access for all Indians in all Indian languages, with a focus on voice-based access.
The evolution of the Indian education system will come through the community, which will involve stakeholders across the board. This is why the work being done by organisations such as Khan Academy and the large language models in India is so important.
Harnessing the power of community
The first building block for AI-supported education is learning infrastructures. Educational non-profit Khan Academy is working with OpenAI (the maker of ChatGPT) to test some ways of making generative AI work for students and educators in the US.
The project, called Khanmigo, involves working in US school districts to understand what activities move the needle in the classroom, and that AI intervention encourages children to think for themselves and find lateral solutions to real-world problems. The model works in the same way as ChatGPT. For instance, if a student wants to calculate the area of a circle, the AI is trained to not provide an answer right away. It prompts the student to type the answer, offering a hint if the child is unable to come up with an answer.
“When OpenAI approached us to experiment with large language models, we knew that this was the perfect opportunity for us to bring the personal tutoring aspect to a student. The question was, how do you do this?” said Niharika Gaur, the Head of Philanthropy at Khan Academy, back in our packed hall.
It worked with four guardrails in mind:
An infrastructure like this needs to be a safe space for teachers and students.
It needs to be fun and engaging.
It should ensure that AI does not give students the answer.
It should be ethical and responsible.
In India, Khan Academy works with state government-run schools and contextualises its short video lessons, practice exercises and lessons in bite-sized formats across different school subjects in different Indian languages. Taking its learnings from the ongoing experiment, Khan can build a contextual engine of sorts, in the local language, so students can interact and learn. For Indian school teachers, who are always pressed for time, this could mean automating parts of the learning process, so they can focus more effectively on other learning outcomes.
“How we react and how we work with AI is where we have to come together and figure out how to make these things work for all people.” Vivek Raghavan
This is an example of a learning infrastructure.
The other use of AI models trained in Indian languages speech recognition was demonstrated by Jagadish Babu, the Chief Operating Officer at EkStep Foundation. A successful implementation of this model under AI4 Bharat has been done in the state of Tamil Nadu, where children in 6,000 schools are learning language fluency.
He pulled up the chat screen and asked the model to explain photosynthesis. The model is trained on open-source NCERT books, and the chatbot pulls the relevant portion from the book, even providing a link to the source material. Nothing most of the room hadn’t seen before. And then, Jagadish asked for the same answer in Kannada. The answer was not just correct, but also complete. For those working on language models for some time, this was the “aaha!” moment, because answering questions in the right context in complete sentences, in a non-English language, is remarkable. This model is trained in 10 Indian languages and will be trained further in others. It also works for teachers who can ask the model to create questions from a particular section in a specific textbook.
“Fine-tuning these large language models is a difficult job. These textbooks are ingested by looking at what the text layer is and creating embeddings of them as a vector index. And when you ask a question here, you try to find appropriate pieces from that textbook and take them as part of the prompt to answer the question. That ensures that this process is extensible. Tomorrow, I can add new textbooks, new topics, new classes,” explained Pratyush Kumar, the co-founder of AI4Bharat.
This is an example of a language infrastructure.
Taking off
As the room disbursed, it was clear the end of the meeting was not a one-and-done workshop where people networked, ate and walked away. It was a time to start the process of building deeper relationships and answers.
The ambition is now to reimagine education. But to do so, builders and doers will have to take a two-phased approach.
The first: figure out the minimum infrastructure required to support a behavioural change among students and educators. Currently, the country and this community is firmly in phase-1, where we have a vague idea but the proof will be in how this AI pudding is made.
Effective implementation will involve building learning and language infrastructures at a population scale. This will form the base infrastructure layer, which will enable newer applications and tools to be built on top. It was also recommended to think of building these base layers along the lines of Aadhaar, where the policy and regulatory framework were developed, along with the identity infrastructure. Similarly, the guardrails for data collection without invading minors’ privacy should be embedded into the system, instead of plugging them as an afterthought. If done right, we could have an education history, similar to medical history for a person, which can then lend itself to many other kinds of use cases.
The second phase will be capital requirements — providing device-based access to all students, or ways around it, training teachers to make AI tools complementary to their teaching, and the cost of training will have to be borne by governments and public infrastructure. Eventually, the idea is to build complementary capabilities for existing skill sets, instead of removing them entirely.
With the building blocks of large-scale infrastructures already in place, it is only a matter of time before the paths of everyone in the ecosystem converge.