Group Direction - AI Model Validation

The following page may contain information related to upcoming products, features and functionality. It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features or functionality remain at the sole discretion of GitLab Inc.

Overview

Enabling Excellence in AI Model Evaluation: Realizing the Potential of Generative AI Models

Our vision is to establish a comprehensive framework that enables users to evaluate and unlock the full potential of Generative AI (GenAI) models, including those from Google, Anthropic, the OSS community, and beyond. By providing a unified platform with advanced evaluation tools, actionable insights, and a focus on ethical considerations, we aim to foster trust, transparency, and continual improvement in AI technologies.

Focus

The AI Model Validation group began in 2023 by laying a foundation to evaluate large language models (LLMs) for AI-powered GitLab features. Initially focused on testing Code Suggestions, the scope quickly expanded to include multi-turn conversations (Chat) and specialized evaluations, such as vulnerability data.

In 2024, we prioritized building a robust evaluation system to support GitLab Duo. The Central Evaluation Framework (CEF) enables large-scale evaluations, informing decisions about AI model selection. Learn more about how we govern AI model decisions and take a deep dive into the CEF.

Strategic Overview

Our strategy addresses three key questions: Why, When, and How Much.

Why: The goal of the Model Validation team is to ensure AI models used within GitLab are thoroughly validated and reliable. By consolidating tools like ELI5 and CEF, optimizing datasets, and streamlining evaluation processes, we consistently deliver high-quality AI features.
When: The FY26 strategy is broken into quarterly objectives:
- Q1: Establishing foundational tools and datasets.
- Q2: Advancing tools, datasets, and evaluators.
- Q3: Scaling evaluations and introducing context management.
- Q4: Finalizing metrics and preparing for future scalability.
How Much: Resources are allocated to consolidate tools, manage datasets, evaluate foundational models, and monitor performance metrics.

Vision

The Model Validation team ensures reliable AI models by optimizing datasets, unifying evaluation tools, and enhancing infrastructure. We refine processes, tools, and frameworks to maintain trust, performance, and scalability in GitLab's AI features.

FY26 Roadmap: Key Goals

Q1: Foundation Building

Tool Consolidation: Develop a blueprint for unifying tools like ELI5, Langsmith, and CEF.
Datasets: Create guidelines and a single source of truth (SSoT) for dataset management.

Q2: Expansion & Quality Focus

Evaluators: Build pre-configured LLM judges and validation processes.
Datasets: Automate data-fetching logic for seamless integration.

Q3: Scaling & Context Management

Foundational Models: Automate model functionality for easier updates.
Context: Introduce robust context management for evaluations.

Q4: Metrics & Future-Proofing

Performance Metrics: Define and implement latency and token usage tracking.
Infrastructure: Review and scale validation systems for long-term demands.

Last Reviewed: 2024-12-11
Last Updated: 2024-12-11