DevOps-GitLab

Backup and restore | GitLab

Toolbox pod
Backup utility
- Backups
  - Sequence of execution
  - Command line arguments
  - GitLab backup bucket
  - Backing up to Google Cloud Storage
- Restore

Backup and restore

This document explains the technical implementation of the backup and restore into/from CNG.

Toolbox pod

The toolbox chart deploys a pod into the cluster. This pod will act as an entry point for interaction with other containers in the cluster.

Using this pod user can run commands using kubectl exec -it <pod name> -- <arbitrary command>

The Toolbox runs a container from the Toolbox image.

The image contains some custom scripts that are to be called as commands by the user. Those scripts are for running Rake tasks, backup, restore, and some helper scripts for interacting with object storage.

Backup utility

Backup utility is one of the scripts
in the task runner container and as the name suggests it is a script used for doing backups but also handles restoring of an existing backup.

Backups

The backup utility script when run without any arguments creates a backup tar and uploads it to object storage.

Sequence of execution

Backups are made using the following steps, in order:

Backup the database (if not skipped) using the GitLab backup Rake task
Backup the repositories (if not skipped) using the GitLab backup Rake task
For each of the object storage backends
1. If the object storage backend is marked for skipping, skip this storage backend.
2. Tar the existing data in the corresponding object storage bucket naming it <bucket-name>.tar
3. Move the tar to the backup location on disk
Write a backup_information.yml file which contains some metadata identifying the version of GitLab, the time of the backup and the skipped items.
Create a tar file containing individual tar files along with backup_information.yml
Upload the resulting tar file to object storage gitlab-backups bucket.

Command line arguments

--skip <component>

You can skip parts of the backup process by using --skip <component> for every component that you want to skip in the backup process. Skippable components are the database ( db ), repositories ( repositories ), and any of the object storages ( registry , uploads , artifacts , lfs , packages , external_diffs , terraform_state , or ci_secure_files ).
-t <timestamp-override-value>

This gives you partial control over the name of the backup: when you specify this flag the created backup will be named <timestamp-override-value>_gitlab_backup.tar . The default value is the current UNIX timestamp, postfixed with the current date formatted to YYYY_mm_dd .
--backend <backend>

Configures the object storage backend to use for backups. Can be either s3 or gcs . Default is s3 .
--storage-class <storage-class-name>

It is also possible to specify the storage class in which the backup is stored using --storage-class <storage-class-name> , allowing you to save on backup storage costs. If unspecified, this will use the default of the storage backend.

This storage class name is passed through as-is to the storage class argument of your specified backend.

GitLab backup bucket

The default name of the bucket that will be used to store backups is gitlab-backups . This is configurable
using the BACKUP_BUCKET_NAME environment variable.

Backing up to Google Cloud Storage

By default, the backup utility uses s3cmd to upload and download artifacts from object storage. While this can work with Google Cloud Storage (GCS),
it requires using the Interoperability API which makes undesirable compromises to authentication and authorization. When using Google Cloud Storage
for backups you can configure the backup utility script to use the Cloud Storage native CLI, gsutil , to do the upload and download
of your artifacts by setting the BACKUP_BACKEND environment variable to gcs .

Restore

The backup utility when given an argument --restore attempts to restore from an existing backup to the running instance. This
backup can be from either an Omnibus GitLab or a CNG Helm chart installation given that both the instance that was
backed up and the running instance runs the same version of GitLab. The restore expects a file in backup bucket using -t <backup-name> or a remote URL using -f <url> .

When given a -t parameter it looks into backup bucket in object storage for a backup tar with such name. When
given a -f parameter it expects that the given URL is a valid URI of a backup tar in a location accessible from the container.

After fetching the backup tar the sequence of execution is:

For repositories and database run the GitLab backup Rake task
For each of object storage backends:
- tar the existing data in the corresponding object storage bucket naming it <backup-name>.tar
- upload it to tmp bucket in object storage
- clean up the corresponding bucket
- restore the backup content into the corresponding bucket

If the restore fails, the user will need to revert to previous backup using data in

tmp

directory of the backup bucket. This is currently a manual process.

Stay Ahead in Today’s Competitive Market!
Unlock your company’s full potential with a Virtual Delivery Center (VDC). Gain specialized expertise, drive seamless operations, and scale effortlessly for long-term success.

Book A Meeting To Setup A VDC

Design Decisions | GitLab

Attempt to catch problematic configurations
Breaking changes via deprecation
Preference of Secrets in initContainer over Environment
Sub-charts are deployed from global chart
Template partials for gitlab/* should be global whenever possible
Forked charts
- Redis
- Redis HA
- MinIO
- registry
- NGINX Ingress
Kubernetes version used throughout chart
Image variants shipped with CNG

Design Decisions

This documentation collects reasoning and decisions made
regarding the design of the Helm charts in this repository.

Attempt to catch problematic configurations

Due to the complexity of these charts and their level of flexibility, there are some
overlaps where it is possible to produce a configuration that would lead to an
unpredictable, or entirely non-functional deployment. In an effort
to prevent known problematic settings combinations, we have implemented template logic
designed to detect and warn the user that their configuration will not work.

This replicates the behavior of deprecations, but is specific to ensuring functional configuration.

Introduced in !757 checkConfig: add methods to test for known errors

Breaking changes via deprecation

During the development of these charts, we occasionally make improvements that require
alterations to the properties of existing deployments. Two examples were the centralization
of configuring the use of MinIO, and the migration of external object storage configuration
from properties to secrets (in observance of our preference).

As a means of preventing a user from accidentally deploying an updated version of these
charts which includes a breaking change against a configuration that would not function, we
have chosen to implement deprecation notifications. These are designed to detect
properties have been relocated, altered, replaced, or removed entirely, then inform
the user of what changes need to be made to the configuration. This may include informing
the user to see documentation on how to replace a property with a secret. These notifications
will cause the Helm install or upgrade commands to stop with a parse error, and output a complete list of items that need to be addressed. We have taken care to ensure a user will not be placed into a loop of error, fix, repeat.

All deprecations must be addressed in order for a successful deployment to occur. We believe
the user would prefer to be informed of a breaking change over experiencing unexpected
behavior or complete failure that requires debugging.

Introduced in !396 Deprecations: implement buffered list of deprecations

Preference of Secrets in initContainer over Environment

Much of the container ecosystem has, or expects, the capability to be configured
through environment variables. This configuration practice
stems from the concept of The Twelve-Factor App. This
greatly simplifies configuration across multiple deployment environments, but there
remains a security concern with passing connection secrets such as passwords and
private keys via the container’s environment.

Most container ecosystems provide a simple method to inspect the state of a running
container, which usually includes the environment. Using Docker
as an example, any process capable of communicating with the daemon can query the
state of all running containers. This means that if you have a privileged container
such as dind , that container can then inspect the environment of any container
on a given node, and expose all secrets contained within.
As a part of the complete DevOps lifecycle, dind is regularly
used for building containers that will be pushed to a registry and subsequently
deployed.

This concern is why we’ve decided to prefer the population of sensitive information
via initContainers.

Related issues:

#90
#114

Sub-charts are deployed from global chart

All sub-charts of this repository are designed to be deployed via the global chart.
Each component can still be deployed individually, but make use of a common set of
properties facilitated by the global chart.

This decision simplifies both the use and maintenance of the repository as a whole.

Related issue:

#352

Template partials for `gitlab/*` should be global whenever possible

All template partials of the gitlab/* sub-charts should be a part of the global or
GitLab sub-chart templates/_helpers.tpl whenever possible. Templates from
forked charts will remain a part of those charts. This reduces
the maintenance impact of these forks.

The benefits of this are straight-forward:

Increased DRY behavior, leading to easier maintenance. There should be no reason
to have duplicates of the same function across multiple sub-charts when a single
entry will suffice.
Reduction of template naming conflicts. All partials throughout a chart are compiled together,
and thus we can treat them like the global behavior they are.

Related issue:

#352

Forked charts

The following charts have been forked or re-created in this repository following
our guidelines for forking

Redis

With the 3.0 release of the GitLab Helm chart, we no longer fork the upstream Redis chart,
and instead include it as a dependency.

Redis HA

Redis-HA was a chart we included in our releases prior to 3.0 . It has now been removed,
and replaced with upstream Redis chart
which has added optional HA support.

MinIO

Our MinIO chart was altered from the upstream MinIO.

Make use of pre-existing Kubernetes secrets instead of creating new ones from properties.
Remove providing the sensitive keys via Environment.
Automate the creation of multiple buckets via defaultBuckets in place of
defaultBucket.* properties.

registry

Our registry chart was altered from the upstream docker-registry .

Enable the use of in-chart MinIO services automatically.
Automatically hook authentication to the GitLab services.

NGINX Ingress

Our NGINX Ingress chart was altered from the upstream NGINX Ingress.

Add feature to allow for the TCP ConfigMap to be external to the chart
Add feature to allow Ingress class to be templated based on release name

Kubernetes version used throughout chart

To maximize support for different Kubernetes versions, use a kubectl that’s
one minor version lower than the current stable release of Kubernetes.
This should allow support for at least three, and quite possibly more
Kubernetes minor versions. For further discussion on kubectl versions, see
issue 1509.

Related Issues:

charts/gitlab#1509
charts/gitlab#1583

Related Merge Requests:

charts/gitlab!1053
build/CNG!329
gitlab-build-images!251

Image variants shipped with CNG

Date: 2022-02-10

The CNG project ships images based on both Debian and UBI. The decision to maintain configuration
for both distributions was based upon the following:

Why we ship Debian-based images:
- Track record, precedent
- Familiarity with distribution
- Community vs “enterprise”
- Lack of perceived vendor lock-in
Why we ship UBI-based images:
- Required in some customer environments
- Required for RHEL certification and inclusion into the OpenShift Marketplace / RedHat Catalog

Further discussion on this topic can be found in issue #3095.

Read article

Goals | GitLab

Scheduler
Helm charts

Goals

We have a few core goals with this initiative:

Easy to scale horizontally
Easy to deploy, upgrade, maintain
Wide support of cloud service providers
Initial support for Kubernetes and Helm, with flexibility to support other
schedulers in the future

Scheduler

We will launch with support for Kubernetes, which is mature and widely supported
across the industry. As part of our design however, we will try to avoid decisions
which will preclude the support of other schedulers. This is especially true for
downstream Kubernetes projects like OpenShift and Tectonic. In the future other
schedulers may also be supported like Docker Swarm and Mesosphere.

We aim to support the scaling and self-healing capabilities of Kubernetes:

Readiness and Health checks to ensure pods are functioning, and if not to recycle them
Tracks to support canary and rolling deployments
Auto-scaling

We will try to leverage standard Kubernetes features:

ConfigMaps for managing configuration. These will then get mapped or passed to
Docker containers
Secrets for sensitive data

Since we might be also using Consul, this may be utilized instead for consistency with other installation methods.

Helm charts

A Helm chart will be created to manage the deployment of each GitLab specific container/service. We will then also include bundled charts to make the overall deployment easier. This is particularly
important for this effort, as there will be significantly more complexity in
the Docker and Kubernetes layers than the all-in-one Omnibus based solutions.
Helm can help to manage this complexity, and provide an easy top level interface
to manage settings via the values.yaml file.

We plan to offer a three tiered set of Helm charts:

Helm chart Structure

Read article

Architecture of Cloud native GitLab Helm charts | GitLab

Architecture of Cloud native GitLab Helm charts

Documentation Organization:

Goals
Architecture
Design Decisions
Resource Usage

Read article

Resource usage | GitLab

Resource Requests
- GitLab Shell
- Webservice
- Sidekiq
- KAS

Resource usage

Resource Requests

All of our containers include predefined resource request values. By default we
have not put resource limits into place. If your nodes do not have excess memory
capacity, one option is to apply memory limits, though adding more memory (or nodes)
would be preferable. (You want to avoid running out of memory on any of your
Kubernetes nodes, as the Linux kernel’s out of memory manager may end essential Kube processes)

In order to come up with our default request values, we run the application, and
come up with a way to generate various levels of load for each service. We monitor the
service, and make a call on what we think is the best default value.

We will measure:

Idle Load - No default should be below these values, but an idle process
isn’t useful, so typically we will not set a default based on this value.
Minimal Load - The values required to do the most basic useful amount of work.
Typically, for CPU, this will be used as the default, but memory requests come with
the risk of the Kernel reaping processes, so we will avoid using this as a memory default.
Average Loads - What is considered average is highly dependent on the installation,
for our defaults we will attempt to take a few measurements at a few of what we
consider reasonable loads. (we will list the loads used). If the service has a pod
autoscaler, we will typically try to set the scaling target value based on these.
And also the default memory requests.
Stressful Task - Measure the usage of the most stressful task the service
should perform. (Not necessary under load). When applying resource limits, try and
set the limit above this and the average load values.
Heavy Load - Try and come up with a stress test for the service, then measure
the resource usage required to do it. We currently don’t use these values for any
defaults, but users will likely want to set resource limits somewhere between the
average loads/stress task and this value.

GitLab Shell

Load was tested using a bash loop calling nohup git clone <project> <random-path-name> in order to have some concurrency.
In future tests we will try to include sustained concurrent load, to better match the types of tests we have done for the other services.

Idle values
- 0 tasks, 2 pods
  - cpu: 0
  - memory: 5M
Minimal Load
- 1 tasks (one empty clone), 2 pods
  - cpu: 0
  - memory: 5M
Average Loads
- 5 concurrent clones, 2 pods
  - cpu: 100m
  - memory: 5M
- 20 concurrent clones, 2 pods
  - cpu: 80m
  - memory: 6M
Stressful Task
- SSH clone the Linux kernel (17MB/s)
  - cpu: 280m
  - memory: 17M
- SSH push the Linux kernel (2MB/s)
  - cpu: 140m
  - memory: 13M
  - Upload connection speed was likely a factor during our tests
Heavy Load
- 100 concurrent clones, 4 pods
  - cpu: 110m
  - memory: 7M
Default Requests
- cpu: 0 (from minimal load)
- memory: 6M (from average load)
- target CPU average: 100m (from average loads)
Recommended Limits
- cpu: > 300m (greater than stress task)
- memory: > 20M (greater than stress task)

Check the troubleshooting documentation
for details on what might happen if gitlab.gitlab-shell.resources.limits.memory is set too low.

Webservice

Webservice resources were analyzed during testing with the
10k reference architecture.
Notes can be found in the Webservice resources documentation.

Sidekiq

Sidekiq resources were analyzed during testing with the
10k reference architecture.
Notes can be found in the Sidekiq resources documentation.

KAS

Until we learn more about our users need, we expect that our users will be using KAS the following way.

Idle values
- 0 agents connected, 2 pods
  - cpu: 10m
  - memory: 55M
Minimal Load :
- 1 agents connected, 2 pods
  - cpu: 10m
  - memory: 55M
Average Load : 1 agent is connected to the cluster.
- 5 agents connected, 2 pods
  - cpu: 10m
  - memory: 65M
Stressful Task :
- 20 agents connected, 2 pods
  - cpu: 30m
  - memory: 95M
Heavy Load :
- 50 agents connected, 2 pods
  - cpu: 40m
  - memory: 150M
Extra Heavy Load :
- 200 agents connected, 2 pods
  - cpu: 50m
  - memory: 315M

The KAS resources defaults set by this chart are more than enough to handle even the 50 agents scenario.
If you are planning to reach what we consider an Extra Heavy Load , then you should consider tweaking the
default to scale up.

Defaults : 2 pods, each with
- cpu: 100m
- memory: 100M

For more information on how these numbers were calculated, see the
issue discussion.

Read article

Backing up a GitLab installation | GitLab

Create the backup
Cron based backup
Backup utility extra arguments
Backup the secrets
Additional Information

Backing up a GitLab installation

GitLab backups are taken by running the backup-utility command in the Toolbox pod provided in the chart. Backups can also be automated by enabling the Cron based backup functionality of this chart.

Before running the backup for the first time, you should ensure the
Toolbox is properly configured
for access to object storage

Follow these steps for backing up a GitLab Helm chart based installation

Create the backup

Ensure the toolbox pod is running, by executing the following command
```
kubectl get pods -lrelease=RELEASE_NAME,app=toolbox
```

Run the backup utility

        kubectl exec <Toolbox pod name> -it -- backup-utility

       

Visit the gitlab-backups bucket in the object storage service and ensure a tarball has been added. It will be named in <timestamp>_<version>_gitlab_backup.tar format.
This tarball is required for restoration.

Cron based backup

The Kubernetes CronJob created by the Helm chart
sets the


          cluster-autoscaler.kubernetes.io/safe-to-evict: "false"

annotation on the jobTemplate. Some Kubernetes environments, such as
GKE Autopilot, don’t allow this annotation to be set and will not create
Job Pods for the backup.

Cron based backups can be enabled in this chart to happen at regular intervals as defined by the Kubernetes schedule.

You need to set the following parameters:

gitlab.toolbox.backups.cron.enabled : Set to true to enable cron based backups
gitlab.toolbox.backups.cron.schedule : Set as per the Kubernetes schedule docs
gitlab.toolbox.backups.cron.extraArgs : Optionally set extra arguments for backup-utility (like --skip db )

Backup utility extra arguments

The backup utility can take some extra arguments. See what those are with:

      kubectl exec <Toolbox pod name> -it -- backup-utility --help

     

Backup the secrets

You also need to save a copy of the rails secrets as these are not included in the backup as a security precaution. We recommend keeping your full backup that includes the database separate from the copy of the secrets.

Find the object name for the rails secrets
```
kubectl get secrets | grep rails-secret
```

Save a copy of the rails secrets

        kubectl get secrets <rails-secret-name> -o jsonpath="{.data['secrets\.yml']}" | base64 --decode > gitlab-secrets.yaml

       

Store gitlab-secrets.yaml in a secure location. You need it to restore your backups.

Additional Information

GitLab chart Backup/Restore Introduction
Restoring a GitLab installation

Read article

Welcome to Knowledge Base!

KB at your finger tips

DevOps-GitLab