Measuring AI’s effectiveness for every voice, everywhere

There are more than 8 billion people that speak more than 7,100 languages in the world. AI is not able to converse as fluently or take action with 98% of them—leaving out billions of people. We aim to fix that.

  • Glocal Evaluation of Models・全球本地化模型评估・स्थानीय वैश्विक मॉडल मूल्यांकन・Evaluación Global Local de Modelos・Évaluation Globale Locale des Modèles・تقييم عالمي محلي للنماذج・গ্লোবাল লোকাল মডেল মূল্যায়ন・Глокальная Оценка Моделей・Avaliação Global Local de Modelos・عالمی مقامی ماڈلز کا تجزیہ・Evaluasi Global Lokal Model・Globale Lokale Bewertung von Modellen・グローカルモデル評価・Tathmini ya Kidunia ya Mifano・ग्लोकल मॉडेल मूल्यमापन・గ్లోబల్ లోకల్ మోడల్ విలువాంకనం・Küresel Yerel Modeller Değerlendirmesi・글로컬 모델 평가・Évaluation Locale Globale des Modèles・Đánh Giá Mô Hình Toàn Cầu Địa Phương・Valutazione Globale Locale dei Modelli・Kima Tsarin Duniya na Gida・การประเมินระดับโลกท้องถิ่นของโมเดล・ગ્લોબલ લોકલ મોડેલ્સનું મૂલ્યાંકન・Lokalna Ocena Globalna Modeli・ارزیابی جهانی محلی مدل‌ها・Penilaian Global Tempatan Model・د سیمه ییز نړیوال ماډل ارزونه・Munduko Tokiko Ereduen Ebaluazioa・평가 글로벌 로컬 모델・


The Vision for GEM

We want to build a transparent, scalable, and inclusive benchmarking framework, aiming to measure and improve the performance of AI across diverse domains and multiple modalities, in every language.

Capturing this diversity of language is a hard problem. A lack of high-quality diverse training sets then leads to poor performance of AI models. This poor performance then implies that state-of-the-art models— which are primarily trained on Internet content of the world’s richest economies—don’t work particularly well for writers and speakers of under-resourced languages. 

The GEM effort is an attempt to create transparency and competition among the AI community and its potential users to highlight and improve the state of AI for every language and speaker. We hope to also prod private, government and public technology makers to create ever-better AI models for global language understanding via the proven power of open, transparent competition. We take inspiration from other AI leaderboards like HuggingFace’s Open LLM ranking board.  As private, public and open source technology makers publish ever-improving AI models each month, we hope to enable organisations to quickly benchmark new models and evaluate with their data so they can make appropriate price, speed and performance decisions as well.

Join the Community

Who is this For?

Model Builders

People/Organisations building Global AI models want to know how good their models are. With the extensive evaluation, model builders will be able to quantify how good their models are and have a path to make them better.

Model Users

People/Organisations who want to use AI models can use this to know which models best fit their particular use-case and consumer language needs. Especially organisations working with low-resource language populations to get an AI bot understand them linguistically.

Research Groups

Research groups can use this to uncover big gaps that still exist in the models that are unresolved. Our open evaluation methods would help understand better ways to do evaluations in general.

The GEM Leaderboard

In order to transparently track which models perform the best across which languages, we have aggregated various language-focused benchmarks as part of a leaderboard. Starting with Pariksha.

Featured: Pariksha by Microsoft Research India & Karya (An Indic Language LLM Leaderboard)

(Others Coming Soon)

Partners

Evaluation Approach

Evaluating AI models is a hard problem, especially for Indic and African languages. Any framework of evaluation will be imperfect to start with, so we imagine this to be an iterative framework with the following principles:

Curated Series of Evaluations

With experts such as at the GEM Partners, we have created create relevant and challenging prompts, ensuring assessments go beyond accuracy to include fairness and diversity.

Transparent & Open Methodology

We detail the evaluation criteria of each curator and process, allowing for clear understanding and replicability. Ranking may not be perfect but it should be transparent. We aim to rank models by current performance and make outcomes transparent.

Extensible curation by design

In order to cater to different needs of different languages, modalities & evaluation frameworks, we have kept the GEM Leaderboard to be managed by multiple curators

Designed as an Iterative Process

No evaluation is perfect. Acknowledging the imperfection of benchmarks, regular updates and feedback loops are incorporated to refine the evaluation process continuously.

Run it with your company own data

The system is designed for customization, allowing organizations to conduct the evaluations on their own data, or run custom evaluations tailored to their specific needs.

Frequently Asked Questions

What makes our leaderboard different?

How often is the leaderboard updated?

Who will run the evaluations?

Get Involved

Join the Community

Have questions or want to contribute?

Reach out to us at hello@peopleplus.ai and join our mission to elevate Global AI Models.

MODEL BUILDERS

Apply to Evaluate your Models

Test your Indic LLMs against our benchmarks to gauge and improve their performance.

CURATORS

Contact Here to Become a Curator

Contribute your knowledge as individuals or organisations to help refine evaluation & prompts, improving relevance & comprehensiveness.

ORGANISATIONS

Run Custom Evaluations or Attend Event

Sign up for the first Global AI Models workshop happening in the first week of May.

Join the Community

Join the Community

People+ai is an EkStep Foundation initiative. Our work is designed around the belief that technology, especially ai, will cause paradigm shifts that can help India & its people reach their potential.