For GitLab Duo, real-time AI-powered capabilities like Code Suggestions need low-latency response times for a frictionless developer experience. Users don’t want to interrupt their flow and wait for a code suggestion to show up. To ensure GitLab Duo can provide the right suggestion at the right time and meet high performance standards for critical AI infrastructure, GitLab recently launched our first multi-region service to deliver AI features.
In this article, we will cover the benefits of multi-region services, how we built an internal platform codenamed ‘Runway’ for provisioning and deploying multi-region services using GitLab features, and the lessons learned migrating to multi-region in production.
Background on the project
Runway is GitLab’s internal platform as a service (PaaS) for provisioning, deploying, and operating containerized services. Runway's purpose is to enable GitLab service owners to self-serve infrastructure needs with production readiness out of the box, so application developers can focus on providing value to customers. As part of our corporate value of dogfooding, the first iteration was built in 2023 by the Infrastructure department on top of core GitLab capabilities, such as continuous integration/continuous delivery (CI/CD), environments, and deployments.
By establishing automated GitOps best practices, Runway services use infrastructure as code (IaC), merge requests (MRs), and CI/CD by default.
GitLab Duo is primarily powered by AI Gateway, a satellite service written in Python outside of GitLab’s modular monolith written in Ruby. In cloud computing, a region is a geographical location of data centers operated by cloud providers.
Defining a multi-region strategy
Deploying in a single region is a good starting point for most services, but can come with downsides when you are trying to reach a global audience. Users who are geographically far from where your service is deployed may experience different levels of service and responsiveness than those who are closer. This can lead to a poor user experience, even if your service is well built in all other respects.
For AI Gateway, it was important to meet global customers wherever they are located, whether on GitLab.com or self-managed instances using Cloud Connector. When a developer is deciding to accept or reject a code suggestion, milliseconds matter and can define the user experience.
Goals
Multi-region deployments require more infrastructure complexity, but for use cases where latency is a core component of the user experience, the benefits often outweigh the downsides. First, multi-region deployments offer increased responsiveness to the user. By serving requests from locations closest to end users, latency can be significantly reduced. Second, multi-region deployments provide greater availability. With fault tolerance, services can fail over during a regional outage. There is a much lower chance of a service failing completely, meaning users should not be interrupted even in partial failures.
Based on our goals for performance and availability, we used this opportunity to create a scalable multi-region strategy in Runway, which is built leveraging GitLab features.
Architecture
In SaaS platforms, GitLab.com’s infrastructure is hosted on Google Cloud Platform (GCP). As a result, Runway’s first supported platform runtime is Cloud Run. The initial workloads deployed on Runway are stateless satellite services (e.g., AI Gateway), so Cloud Run services are a good fit that provide a clear migration path to more complex and flexible platform runtimes, e.g. Kubernetes.
Building Runway on top of GCP Cloud Run using GitLab has allowed us to iterate and tease out the right level of abstractions for service owners as part of a platform play in the Infrastructure department.
To serve traffic from multiple regions in Cloud Run, the multi-region deployment strategy must support global load balancing, and the provisioning and configuration of regional resources. Here’s a simplified diagram of the proposed architecture in GCP:
By replicating Cloud Run services across multiple regions and configuring the existing global load balancing with serverless network endpoint group (NEG) backends, we’re able to serve traffic from multiple regions. For the remainder of the article, we’ll focus less on specifics of Cloud Run and more on how we’re building with GitLab.
Building a multi-region platform with GitLab
Now that you have context about Runway, let's walk through how to build a multi-region platform using GitLab features.
Provision
When building an internal platform, the first challenge is provisioning infrastructure for a service. In Runway, Provisioner is the component that is responsible for maintaining a service inventory and managing IaC for GCP resources using Terraform.
To provision a service, an application developer will open an MR to add a service project to the inventory using git, and Provisioner will create required resources, such as service accounts and identity and access management policies. When building this functionality with GitLab, Runway leverages OpenID Connect (OIDC) with GPC Workload Identity Federation for managing IaC.
Additionally, Provisioner will create a deployment project for each service project. The purpose of creating separate projects for deployments is to ensure the principle of least privilege by authenticating as a GCP service account with restricted permissions. Runway leverages the Projects API for creating projects with Terraform provider.
Finally, Provisioner defines variables in the deployment project for the service account, so that deployment CI jobs can authenticate to GCP. Runway leverages CI/CD variables and Job Token allowlist to handle authentication and authorization.
Here’s a simplified example of provisioning a multi-region service in the service inventory:
{
"inventory": [
{
"name": "example-service",
"project_id": 46267196,
"regions": [
"europe-west1",
"us-east1",
"us-west1"
]
}
]
}
Once provisioned, a deployment project and necessary infrastructure will be created for a service.
Configure
After a service is provisioned, the next challenge is the configuration for a service. In Runway, Reconciler is a component that is responsible for configuring and deploying services by aligning the actual state with the desired state using Golang and Terraform.
Here’s a simplified example of an application developer configuring GitLab CI/CD in their service project:
# .gitlab-ci.yml
stages:
- validate
- runway_staging
- runway_production
include:
- project: 'gitlab-com/gl-infra/platform/runway/runwayctl'
file: 'ci-tasks/service-project/runway.yml'
inputs:
runway_service_id: example-service
image: "$CI_REGISTRY_IMAGE/${CI_PROJECT_NAME}:${CI_COMMIT_SHORT_SHA}"
runway_version: v3.22.0
# omitted for brevity
Runway provides sane default values for configuration that are based on our experience in delivering stable and reliable features to customers. Additionally, service owners can configure infrastructure using a service manifest file hosted in a service project. The service manifest uses JSON Schema for validation. When building this functionality with GitLab, Runway leverages Pages for schema documentation.
To deliver this part of the platform, Runway leverages CI/CD templates, Releases, and Container Registry for integrating with service projects.
Here’s a simplified example of a service manifest:
# .runway/runway-production.yml
apiVersion: runway/v1
kind: RunwayService
spec:
container_port: 8181
regions:
- us-east1
- us-west1
- europe-west1
# omitted for brevity
For multi-region services, Runway injects an environment variable into the container instance runtime, e.g. RUNWAY_REGION, so application developers have the context to make any downstream dependencies regionally-aware, e.g. Vertex AI API.
Once configured, a service project will be integrated with a deployment project.
Deploy
After a service project is configured, the next challenge is deploying a service. In Runway, Reconciler handles this by triggering a deployment job in the deployment project when an MR is merged to the main branch. When building this functionality with GitLab, Runway leverages Trigger Pipelines and Multi-Project Pipelines to trigger jobs from service project to deployment project.
Once a pipeline is running in a deployment project, it will be deployed to an environment. By default, Runway will provision staging and production environments for all services. At this point, Reconciler will apply any Terraform resource changes for infrastructure. When building this functionality with GitLab, Runway leverages Environments/Deployments and GitLab-managed Terraform state for each service.
Runway provides default application metrics for services. Additionally, custom metrics can be used by enabling a sidecar container with OpenTelemetry Collector configured to scrape Prometheus and remote write to Mimir. By providing observability out of the box, Runway is able to bake monitoring into CI/CD pipelines.
Example scenarios include gradual rollouts for blue/green deployments, preventing promotions to production when staging is broken, or automatically rolling back to previous revision when elevated error rates occur in production.
Once deployed, environments will serve the latest revision of a service. At this point, you should have a good understanding of some of the challenges that will be encountered, and how to solve them with GitLab features.
Migrating to multi-region in production
After extending Runway components to support multi-region in Cloud Run, the final challenge was migrating from AI Gateway’s single-region deployment in production with zero downtime. Today, teams using Runway to deploy their services can self-serve on regions making a multi-region deployment just as simple as a single-region deployment.
We were able to iterate on building multi-region functionality without impacting existing infrastructure by using semantic versioning for Runway. Next, we’ll share some learnings from the migration that may inform how to operate services for an internal multi-region platform.
Dry run deployments
In Runway, Reconciler will apply Terraform changes in CI/CD. The trade-off is that plans cannot be verified in advance, which could risk inadvertently destroying or misconfiguring production infrastructure. To solve this problem, Runway will perform a “dry run” deployment for MRs.
For migrating AI Gateway, dry run deployments increased confidence and helped mitigate risk of downtime during rollout. When building an internal platform with GitLab, we recommend supporting dry run deployments from the start.
Regional observability
In Runway, existing observability was aggregated by assuming a single-region deployment. To solve this problem, Runway observability was retrofitted to include a new region label for Prometheus metrics.
Once metrics were retrofitted, we were able to introduce service level indicators (SLIs) for both regional Cloud Run services and global load balancing. Here’s an example dashboard screenshot for a general Runway service:
Note: Data is not actual production data and is only for illustration purposes.
Additionally, we were able to update our service level objectives (SLOs) to support regions. As a result, service owners could be alerted when a specific region experiences an elevated error rate, or increase in response times.
Note: Data is not actual production data and is only for illustration purposes.
For migrating AI Gateway, regional observability increased confidence and helped provide more visibility into new infrastructure. When building an internal platform with GitLab, we recommend supporting regional observability from the start.
Self-service regions
The Infrastructure department successfully performed the initial migration of multi-region support for AI Gateway in production with zero downtime. Given the risk associated with rolling out a large infrastructure migration, it was important to ensure the service continued working as expected.
Shortly afterwards, service owners began self-serving additional regions to meet the growth of customers. At the time of writing, GitLab Duo is available in six regions around the globe and counting. Service owners are able to configure the desired regions, and Runway will provide guardrails along the way in a scalable solution.
Additionally, three other internal services have already started using multi-region functionality on Runway. Application developers have entirely self-served functionality, which validates that we’ve provided a good platform experience for service owners. For a platform play, a scalable solution like Runway is considered a good outcome since the Infrastructure department is no longer a blocker.
What’s next for Runway
Based on how quickly we could iterate to provide results for customers, the SaaS Platforms department has continued to invest in Runway. We’ve grown the Runway team with additional contributors, started evolving the platform runtime (e.g. Google Kubernetes Engine), and continue dogfooding with tighter integration in the product.
If you’re interested in learning more, feel free to check out https://gitlab.com/gitlab-com/gl-infra/platform/runway.