The following page may contain information related to upcoming products, features and functionality. It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features or functionality remain at the sole discretion of GitLab Inc.
In order to support highly available Git repository storage, Gitaly Cluster has been released. This provides redundant storage benefits such as voted writes, read distribution, and data redundancy. For full documentation, please see the details on Configuring Gitaly Cluster.
To fully appreciate the use-case for Gitaly Cluster, we must first clarify the role of highly available repository storage. From the Gitaly perspective, highly available (HA) storage means that a fault-tolerant interface for repository data exists, such that the loss of a single storage node will not compromise the ability to read / write Git data. Gitaly cluster fulfills this role by providing an interface to define multiple Gitaly storage nodes, and set a replication factor for stored repositories (how many nodes each repository should be stored on). In the event of a storage node loss, read and write operations continue as before, and when the cluster is returned to full capacity, the data is re-replicated to the returning node.
What HA repository storage does not provide is improved performance. Though in some cases write performance improves (through read distribution), the general concept is that you are trading storage cost and potential performance impacts for fault tolerance.
The one year plan for Gitaly Group is founded on three main principals discussed below. As Gitaly is one of the foundational elements of GitLab, these principals were chosen ensure that Gitaly is ready to meet future business needs.
In the unlikely event that there is a repository data issue necessitating in data restoration, we want to ensure that there are sufficient tools in place to allow our customers to be up and running as efficiently and painlessly as possible. To this end, the Gitaly team are actively involved in the GitLab Disaster Recovery Working Group where we play a critical role in defining opportunities to improve overall repository data recovery time objectives (RTO) and recovery point objectives (RPO). We also meet regularly with customers to better understand our self-managed customer needs.
Our one year focus areas are:
As our customers deploy larger installations of GitLab, they start to hit more pain points around the scalability of Gitaly and its repository management. This is especially true in light of our intent to make cloud-native deployments of Gitaly a reality, where nodes may be spun up and down more regularly.
Our goals here are:
We recognize that administrators of self-managed instances want to more easily configure and manage repository storage. In order to assist in both of these user journeys, we plan to do the following:
Over the next quarter, the Gitaly team is focused on the following areas. While this is not an exhaustive list, it does give some insight into our major focus areas.
As our customers continue to grow their repositories (both in count and in size), it is critical that the underlying Gitaly services can scale appropriately to meet these demands. As a team, we have decided that the solution to several of the existing scalability issues surrounding Gitaly cluster is a paradigm shift in how we replicate Git data. As such, we have begun initial efforts in shifting to a decentralized architecture for Gitaly Cluster.
The highlights of this forward thinking approach are as follows:
While this is not an area we own as a team, we believe that it is crucial for us to support the teams working in these areas to ensure that Git backups and data management allows for rapid and accurate restoration of Git data. The team is heavily involved in leading the GitLab disaster recovery working group. As leaders in this working group, we're collaborating with a broad cross-functional team to help ensure our service's success long-term. As part of this effort, we're architecting a solution for implementing write-ahead logging on GitLab.com.
We are in the beginning stages of investigating the known limitations around Gitaly running well in Kubernetes. This is going to be an ongoing theme across the next year, but we wanted to begin by ensuring that we understood first and foremost what issues exist today. The team is taking a data driven approach involving load testing in an effort to ensure we have clearly identified functional gaps.
In order to best represent our Transparency Value, it is just as important to clarify what the Gitaly team cannot prioritize currently. This does not mean that we do not recognize the need for some of these features, simply that we have a finite team.
Better Support for Administrative User Journeys
We want to ensure that in the future, we support user journeys such as adding, removing, and replacing nodes cleanly, and provide a basic administrative dashboard to monitor node health.
BIC (Best In Class) is an indicator of forecated near-term market performance based on a combination of factors, including analyst views, market news, and feedback from the sales and product teams. It is critical that we understand where GitLab appears in the BIC landscape.
The version control systems market is expected to be valued at close to US$550mn in the year 2021 and is estimated to reach US$971.8md by 2027 according to Future Market Insights which is broadly consistent with revenue estimates of GitHub ($250mn ARR) and Perforce ($130mn ARR). The opportunity for GitLab to grow with the market, and grow it's share of the version control market is significant.
Git is the market leading version control system, demonstrated by the 2018 Stack Overflow Developer Survey where over 88% of respondents use Git. Although there are alternatives to Git, Git remains dominant in open source software, usage by developers continues to grow, it installed by default on macOS and Linux, and the project itself continues to adapt to meet the needs of larger projects and enterprise customers who are adopting Git, like the Microsoft Windows project.
According to a 2016 Bitrise survey of mobile app developers, 62% of apps hosted by SaaS provider were hosted in GitHub, and 95% of apps are hosted in by a SaaS provider. These numbers provide an incomplete view of the industry, but broadly represent the large opportunity for growth in SaaS hosting on GitLab.com, and in self hosted where GitLab is already very successful.
Support large repositories
As applications mature, the existing code base continues to grow. As such, average repository sizes are on the rise and version control systems must be able to handle these large repositories in a performant manner. Additionally, many development tasks may require version control of large files, which again, should be handled seamlessly.
Ensure data safety
Application code has a very high value to organizations. It is unacceptable to have a solution which does not make it easy to ensure the integrity of your data, as well as provide easy means of backing up and restoring your data should something go wrong. Ideally, these solutions should use efficient and cost effective storage to optimize your business infrastructure.
Important competitors are GitHub.com and Perforce which, in relation to Gitaly, compete with GitLab in terms of raw Git performance and support for enormous repositories respectively.
Customers and prospects evaluating GitLab (GitLab.com and self hosted) benchmark GitLab's performance against GitHub.com, including Git performance. The Git performance of GitLab.com for easily benchmarked operations like cloning, fetching and pushing, show that GitLab.com similar to GitHub.com.
Perforce competes with GitLab primarily on its ability to support enormous repositories, either from binary files or monolithic repositories with extremely large numbers of files and history. This competitive advantage comes naturally from its centralized design which means only the files immediately needed by the user are downloaded. Given sufficient support in Git for partial clone, and sufficient performance in GitLab for enormous repositories, existing customers are waiting to migrate to GitLab.