The Vision for GEM
We want to build a transparent, scalable, and inclusive benchmarking framework, aiming to measure and improve the performance of AI across diverse domains and multiple modalities, in every language.
Capturing this diversity of language is a hard problem. A lack of high-quality diverse training sets then leads to poor performance of AI models. This poor performance then implies that state-of-the-art models— which are primarily trained on Internet content of the world’s richest economies—don’t work particularly well for writers and speakers of under-resourced languages.
The GEM effort is an attempt to create transparency and competition among the AI community and its potential users to highlight and improve the state of AI for every language and speaker. We hope to also prod private, government and public technology makers to create ever-better AI models for global language understanding via the proven power of open, transparent competition. We take inspiration from other AI leaderboards like HuggingFace’s Open LLM ranking board. As private, public and open source technology makers publish ever-improving AI models each month, we hope to enable organisations to quickly benchmark new models and evaluate with their data so they can make appropriate price, speed and performance decisions as well.
Model Builders
People/Organisations building Global AI models want to know how good their models are. With the extensive evaluation, model builders will be able to quantify how good their models are and have a path to make them better.
Model Users
People/Organisations who want to use AI models can use this to know which models best fit their particular use-case and consumer language needs. Especially organisations working with low-resource language populations to get an AI bot understand them linguistically.
Research Groups
Research groups can use this to uncover big gaps that still exist in the models that are unresolved. Our open evaluation methods would help understand better ways to do evaluations in general.
Frequently Asked Questions