The following page may contain information related to upcoming products, features and functionality. It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features or functionality remain at the sole discretion of GitLab Inc.
Enabling Excellence in AI Model Evaluation: Realizing the Potential of Generative AI Models
Our vision is to establish a comprehensive framework that enables users to evaluate and unlock the full potential of Generative AI (GenAI) models, including those from Google, Anthropic, the OSS community, and beyond. By providing a unified platform with advanced evaluation tools, actionable insights, and a focus on ethical considerations, we aim to foster trust, transparency, and continual improvement in AI technologies.
This group consists of the following categories:
The AI Model Validation group began in 2023 by laying a foundation to evaluate large language models (LLMs) for AI-powered GitLab features. Initially focused on testing Code Suggestions, the scope quickly expanded to include multi-turn conversations (Chat) and specialized evaluations, such as vulnerability data.
In 2024, we prioritized building a robust evaluation system to support GitLab Duo. The Central Evaluation Framework (CEF) enables large-scale evaluations, informing decisions about AI model selection. Learn more about how we govern AI model decisions and take a deep dive into the CEF.
Our strategy addresses three key questions: Why, When, and How Much.
The Model Validation team ensures reliable AI models by optimizing datasets, unifying evaluation tools, and enhancing infrastructure. We refine processes, tools, and frameworks to maintain trust, performance, and scalability in GitLab's AI features.
Last Reviewed: 2024-12-11
Last Updated: 2024-12-11