Welcome to Knowledge Base!

KB at your finger tips

This is one stop global knowledge base where you can learn about all the products, solutions and support features.

Categories
All

DevOps-GitLab

Design Decisions | GitLab






  • Attempt to catch problematic configurations
  • Breaking changes via deprecation
  • Preference of Secrets in initContainer over Environment
  • Sub-charts are deployed from global chart
  • Template partials for gitlab/* should be global whenever possible

  • Forked charts

    • Redis
    • Redis HA
    • MinIO
    • registry
    • NGINX Ingress
  • Kubernetes version used throughout chart
  • Image variants shipped with CNG

Design Decisions

This documentation collects reasoning and decisions made
regarding the design of the Helm charts in this repository.

Attempt to catch problematic configurations

Due to the complexity of these charts and their level of flexibility, there are some
overlaps where it is possible to produce a configuration that would lead to an
unpredictable, or entirely non-functional deployment. In an effort
to prevent known problematic settings combinations, we have implemented template logic
designed to detect and warn the user that their configuration will not work.

This replicates the behavior of deprecations, but is specific to ensuring functional configuration.

Introduced in !757 checkConfig: add methods to test for known errors

Breaking changes via deprecation

During the development of these charts, we occasionally make improvements that require
alterations to the properties of existing deployments. Two examples were the centralization
of configuring the use of MinIO, and the migration of external object storage configuration
from properties to secrets (in observance of our preference).

As a means of preventing a user from accidentally deploying an updated version of these
charts which includes a breaking change against a configuration that would not function, we
have chosen to implement deprecation notifications. These are designed to detect
properties have been relocated, altered, replaced, or removed entirely, then inform
the user of what changes need to be made to the configuration. This may include informing
the user to see documentation on how to replace a property with a secret. These notifications
will cause the Helm install or upgrade commands to stop with a parse error, and output a complete list of items that need to be addressed. We have taken care to ensure a user will not be placed into a loop of error, fix, repeat.

All deprecations must be addressed in order for a successful deployment to occur. We believe
the user would prefer to be informed of a breaking change over experiencing unexpected
behavior or complete failure that requires debugging.

Introduced in !396 Deprecations: implement buffered list of deprecations

Preference of Secrets in initContainer over Environment

Much of the container ecosystem has, or expects, the capability to be configured
through environment variables. This configuration practice
stems from the concept of The Twelve-Factor App. This
greatly simplifies configuration across multiple deployment environments, but there
remains a security concern with passing connection secrets such as passwords and
private keys via the container’s environment.

Most container ecosystems provide a simple method to inspect the state of a running
container, which usually includes the environment. Using Docker
as an example, any process capable of communicating with the daemon can query the
state of all running containers. This means that if you have a privileged container
such as dind , that container can then inspect the environment of any container
on a given node, and expose all secrets contained within.
As a part of the complete DevOps lifecycle, dind is regularly
used for building containers that will be pushed to a registry and subsequently
deployed.

This concern is why we’ve decided to prefer the population of sensitive information
via initContainers.

Related issues:


  • #90
  • #114

Sub-charts are deployed from global chart

All sub-charts of this repository are designed to be deployed via the global chart.
Each component can still be deployed individually, but make use of a common set of
properties facilitated by the global chart.

This decision simplifies both the use and maintenance of the repository as a whole.

Related issue:


  • #352

Template partials for gitlab/* should be global whenever possible

All template partials of the gitlab/* sub-charts should be a part of the global or
GitLab sub-chart templates/_helpers.tpl whenever possible. Templates from
forked charts will remain a part of those charts. This reduces
the maintenance impact of these forks.

The benefits of this are straight-forward:


  • Increased DRY behavior, leading to easier maintenance. There should be no reason
    to have duplicates of the same function across multiple sub-charts when a single
    entry will suffice.
  • Reduction of template naming conflicts. All partials throughout a chart are compiled together,
    and thus we can treat them like the global behavior they are.

Related issue:


  • #352

Forked charts

The following charts have been forked or re-created in this repository following
our guidelines for forking

Redis

With the 3.0 release of the GitLab Helm chart, we no longer fork the upstream Redis chart,
and instead include it as a dependency.

Redis HA

Redis-HA was a chart we included in our releases prior to 3.0 . It has now been removed,
and replaced with upstream Redis chart
which has added optional HA support.

MinIO

Our MinIO chart was altered from the upstream MinIO.


  • Make use of pre-existing Kubernetes secrets instead of creating new ones from properties.
  • Remove providing the sensitive keys via Environment.
  • Automate the creation of multiple buckets via defaultBuckets in place of
    defaultBucket.* properties.

registry

Our registry chart was altered from the upstream docker-registry .


  • Enable the use of in-chart MinIO services automatically.
  • Automatically hook authentication to the GitLab services.

NGINX Ingress

Our NGINX Ingress chart was altered from the upstream NGINX Ingress.


  • Add feature to allow for the TCP ConfigMap to be external to the chart
  • Add feature to allow Ingress class to be templated based on release name

Kubernetes version used throughout chart

To maximize support for different Kubernetes versions, use a kubectl that’s
one minor version lower than the current stable release of Kubernetes.
This should allow support for at least three, and quite possibly more
Kubernetes minor versions. For further discussion on kubectl versions, see
issue 1509.

Related Issues:


  • charts/gitlab#1509
  • charts/gitlab#1583

Related Merge Requests:


  • charts/gitlab!1053
  • build/CNG!329
  • gitlab-build-images!251

Image variants shipped with CNG

Date: 2022-02-10

The CNG project ships images based on both Debian and UBI. The decision to maintain configuration
for both distributions was based upon the following:


  • Why we ship Debian-based images:

    • Track record, precedent
    • Familiarity with distribution
    • Community vs “enterprise”
    • Lack of perceived vendor lock-in
  • Why we ship UBI-based images:

    • Required in some customer environments
    • Required for RHEL certification and inclusion into the OpenShift Marketplace / RedHat Catalog

Further discussion on this topic can be found in issue #3095.

Goals | GitLab






  • Scheduler
  • Helm charts

Goals

We have a few core goals with this initiative:


  1. Easy to scale horizontally
  2. Easy to deploy, upgrade, maintain
  3. Wide support of cloud service providers
  4. Initial support for Kubernetes and Helm, with flexibility to support other
    schedulers in the future

Scheduler

We will launch with support for Kubernetes, which is mature and widely supported
across the industry. As part of our design however, we will try to avoid decisions
which will preclude the support of other schedulers. This is especially true for
downstream Kubernetes projects like OpenShift and Tectonic. In the future other
schedulers may also be supported like Docker Swarm and Mesosphere.

We aim to support the scaling and self-healing capabilities of Kubernetes:


  • Readiness and Health checks to ensure pods are functioning, and if not to recycle them
  • Tracks to support canary and rolling deployments
  • Auto-scaling

We will try to leverage standard Kubernetes features:


  • ConfigMaps for managing configuration. These will then get mapped or passed to
    Docker containers
  • Secrets for sensitive data

Since we might be also using Consul, this may be utilized instead for consistency with other installation methods.

Helm charts

A Helm chart will be created to manage the deployment of each GitLab specific container/service. We will then also include bundled charts to make the overall deployment easier. This is particularly
important for this effort, as there will be significantly more complexity in
the Docker and Kubernetes layers than the all-in-one Omnibus based solutions.
Helm can help to manage this complexity, and provide an easy top level interface
to manage settings via the values.yaml file.

We plan to offer a three tiered set of Helm charts:

Helm chart Structure

Read article

Architecture of Cloud native GitLab Helm charts | GitLab





Architecture of Cloud native GitLab Helm charts

Documentation Organization:


  • Goals
  • Architecture
  • Design Decisions
  • Resource Usage
Read article

Resource usage | GitLab







  • Resource Requests

    • GitLab Shell
    • Webservice
    • Sidekiq
    • KAS

Resource usage

Resource Requests

All of our containers include predefined resource request values. By default we
have not put resource limits into place. If your nodes do not have excess memory
capacity, one option is to apply memory limits, though adding more memory (or nodes)
would be preferable. (You want to avoid running out of memory on any of your
Kubernetes nodes, as the Linux kernel’s out of memory manager may end essential Kube processes)

In order to come up with our default request values, we run the application, and
come up with a way to generate various levels of load for each service. We monitor the
service, and make a call on what we think is the best default value.

We will measure:



  • Idle Load - No default should be below these values, but an idle process
    isn’t useful, so typically we will not set a default based on this value.


  • Minimal Load - The values required to do the most basic useful amount of work.
    Typically, for CPU, this will be used as the default, but memory requests come with
    the risk of the Kernel reaping processes, so we will avoid using this as a memory default.


  • Average Loads - What is considered average is highly dependent on the installation,
    for our defaults we will attempt to take a few measurements at a few of what we
    consider reasonable loads. (we will list the loads used). If the service has a pod
    autoscaler, we will typically try to set the scaling target value based on these.
    And also the default memory requests.


  • Stressful Task - Measure the usage of the most stressful task the service
    should perform. (Not necessary under load). When applying resource limits, try and
    set the limit above this and the average load values.


  • Heavy Load - Try and come up with a stress test for the service, then measure
    the resource usage required to do it. We currently don’t use these values for any
    defaults, but users will likely want to set resource limits somewhere between the
    average loads/stress task and this value.

GitLab Shell

Load was tested using a bash loop calling nohup git clone <project> <random-path-name> in order to have some concurrency.
In future tests we will try to include sustained concurrent load, to better match the types of tests we have done for the other services.



  • Idle values

    • 0 tasks, 2 pods

      • cpu: 0
      • memory: 5M

  • Minimal Load

    • 1 tasks (one empty clone), 2 pods

      • cpu: 0
      • memory: 5M

  • Average Loads

    • 5 concurrent clones, 2 pods

      • cpu: 100m
      • memory: 5M
    • 20 concurrent clones, 2 pods

      • cpu: 80m
      • memory: 6M

  • Stressful Task

    • SSH clone the Linux kernel (17MB/s)

      • cpu: 280m
      • memory: 17M
    • SSH push the Linux kernel (2MB/s)

      • cpu: 140m
      • memory: 13M
      • Upload connection speed was likely a factor during our tests

  • Heavy Load

    • 100 concurrent clones, 4 pods

      • cpu: 110m
      • memory: 7M

  • Default Requests

    • cpu: 0 (from minimal load)
    • memory: 6M (from average load)
    • target CPU average: 100m (from average loads)

  • Recommended Limits

    • cpu: > 300m (greater than stress task)
    • memory: > 20M (greater than stress task)

Check the troubleshooting documentation
for details on what might happen if gitlab.gitlab-shell.resources.limits.memory is set too low.

Webservice

Webservice resources were analyzed during testing with the
10k reference architecture.
Notes can be found in the Webservice resources documentation.

Sidekiq

Sidekiq resources were analyzed during testing with the
10k reference architecture.
Notes can be found in the Sidekiq resources documentation.

KAS

Until we learn more about our users need, we expect that our users will be using KAS the following way.



  • Idle values

    • 0 agents connected, 2 pods

      • cpu: 10m
      • memory: 55M

  • Minimal Load :

    • 1 agents connected, 2 pods

      • cpu: 10m
      • memory: 55M

  • Average Load : 1 agent is connected to the cluster.

    • 5 agents connected, 2 pods

      • cpu: 10m
      • memory: 65M

  • Stressful Task :

    • 20 agents connected, 2 pods

      • cpu: 30m
      • memory: 95M

  • Heavy Load :

    • 50 agents connected, 2 pods

      • cpu: 40m
      • memory: 150M

  • Extra Heavy Load :

    • 200 agents connected, 2 pods

      • cpu: 50m
      • memory: 315M

The KAS resources defaults set by this chart are more than enough to handle even the 50 agents scenario.
If you are planning to reach what we consider an Extra Heavy Load , then you should consider tweaking the
default to scale up.



  • Defaults : 2 pods, each with

    • cpu: 100m
    • memory: 100M

For more information on how these numbers were calculated, see the
issue discussion.

Read article

Backing up a GitLab installation | GitLab






  • Create the backup
  • Cron based backup
  • Backup utility extra arguments
  • Backup the secrets
  • Additional Information

Backing up a GitLab installation

GitLab backups are taken by running the backup-utility command in the Toolbox pod provided in the chart. Backups can also be automated by enabling the Cron based backup functionality of this chart.

Before running the backup for the first time, you should ensure the
Toolbox is properly configured
for access to object storage

Follow these steps for backing up a GitLab Helm chart based installation

Create the backup



  1. Ensure the toolbox pod is running, by executing the following command


    kubectl get pods -lrelease=RELEASE_NAME,app=toolbox

  2. Run the backup utility


    kubectl exec <Toolbox pod name> -it -- backup-utility

  3. Visit the gitlab-backups bucket in the object storage service and ensure a tarball has been added. It will be named in <timestamp>_<version>_gitlab_backup.tar format.


  4. This tarball is required for restoration.

Cron based backup


note
The Kubernetes CronJob created by the Helm chart
sets the cluster-autoscaler.kubernetes.io/safe-to-evict: "false"
annotation on the jobTemplate. Some Kubernetes environments, such as
GKE Autopilot, don’t allow this annotation to be set and will not create
Job Pods for the backup.

Cron based backups can be enabled in this chart to happen at regular intervals as defined by the Kubernetes schedule.

You need to set the following parameters:



  • gitlab.toolbox.backups.cron.enabled : Set to true to enable cron based backups

  • gitlab.toolbox.backups.cron.schedule : Set as per the Kubernetes schedule docs

  • gitlab.toolbox.backups.cron.extraArgs : Optionally set extra arguments for backup-utility (like --skip db )

Backup utility extra arguments

The backup utility can take some extra arguments. See what those are with:

kubectl exec <Toolbox pod name> -it -- backup-utility --help

Backup the secrets

You also need to save a copy of the rails secrets as these are not included in the backup as a security precaution. We recommend keeping your full backup that includes the database separate from the copy of the secrets.



  1. Find the object name for the rails secrets


    kubectl get secrets | grep rails-secret

  2. Save a copy of the rails secrets


    kubectl get secrets <rails-secret-name> -o jsonpath="{.data['secrets\.yml']}" | base64 --decode > gitlab-secrets.yaml

  3. Store gitlab-secrets.yaml in a secure location. You need it to restore your backups.

Additional Information


  • GitLab chart Backup/Restore Introduction
  • Restoring a GitLab installation
Read article

Backup and restore a GitLab instance | GitLab






  • Prerequisites
  • Backup and Restoring procedures

  • Object storage

    • Backups to S3
    • Backups to Google Cloud Storage (GCS)

  • Troubleshooting

    • Pod eviction issues
    • “Bucket not found” errors
    • “AccessDeniedException: 403” errors in GCP

Backup and restore a GitLab instance

GitLab Helm chart provides a utility pod from the Toolbox sub-chart that acts as an interface for the purpose of backing up and restoring GitLab instances. It is equipped with a backup-utility executable which interacts with other necessary pods for this task.
Technical details for how the utility works can be found in the architecture documentation.

Prerequisites



  • Backup and Restore procedures described here have only been tested with S3 compatible APIs. Support for other object storage services, like Google Cloud Storage, will be tested in future revisions.


  • During restoration, the backup tarball needs to be extracted to disk. This means the Toolbox pod should have disk of necessary size available.


  • This chart relies on the use of object storage for artifacts , uploads , packages , registry and lfs objects, and does not currently migrate these for you during restore. If you are restoring a backup taken from another instance, you must migrate your existing instance to using object storage before taking the backup. See issue 646.

Backup and Restoring procedures


  • Backing up a GitLab installation
  • Restoring a GitLab installation

Object storage

We provide a MinIO instance out of the box when using this charts unless an external object storage is specified. The Toolbox connects to the included MinIO by default, unless specific settings are given. The Toolbox can also be configured to back up to Amazon S3 or Google Cloud Storage (GCS).

Backups to S3

The Toolbox uses s3cmd to connect to object storage. In order to configure connectivity to external object storage gitlab.toolbox.backups.objectStorage.config.secret should be specified which points to a Kubernetes secret containing a .s3cfg file. gitlab.toolbox.backups.objectStorage.config.key should be specified if different from the default of config . This points to the key containing the contents of a .s3cfg file.

It should look like this:

helm install gitlab gitlab/gitlab \
--set gitlab.toolbox.backups.objectStorage.config.secret=my-s3cfg \
--set gitlab.toolbox.backups.objectStorage.config.key=config .

s3cmd .s3cfg file documentation can be found here

In addition, two bucket locations need to be configured, one for storing the backups, and one temporary bucket that is used
when restoring a backup.

--set global.appConfig.backups.bucket=gitlab-backup-storage
--set global.appConfig.backups.tmpBucket=gitlab-tmp-storage

Backups to Google Cloud Storage (GCS)

To backup to GCS you must set gitlab.toolbox.backups.objectStorage.backend to gcs . This ensures that the Toolbox uses the gsutil CLI when storing and retrieving
objects. Additionally you must set gitlab.toolbox.backups.objectStorage.config.gcpProject to the project ID of the GCP project that contains your storage buckets.
You must create a Kubernetes secret with the contents of an active service account JSON key where the service account has the storage.admin role for the buckets
you will use for backup. Below is an example of using the gcloud and kubectl to create the secret.

export PROJECT_ID=$(gcloud config get-value project)
gcloud iam service-accounts create gitlab-gcs --display-name "Gitlab Cloud Storage"
gcloud projects add-iam-policy-binding --role roles/storage.admin ${PROJECT_ID} --member=serviceAccount:gitlab-gcs@${PROJECT_ID}.iam.gserviceaccount.com
gcloud iam service-accounts keys create --iam-account gitlab-gcs@${PROJECT_ID}.iam.gserviceaccount.com storage.config
kubectl create secret generic storage-config --from-file=config=storage.config

Configure your Helm chart as follows to use the service account key to authenticate to GCS for backups:

helm install gitlab gitlab/gitlab \
--set gitlab.toolbox.backups.objectStorage.config.secret=storage-config \
--set gitlab.toolbox.backups.objectStorage.config.key=config \
--set gitlab.toolbox.backups.objectStorage.config.gcpProject=my-gcp-project-id \
--set gitlab.toolbox.backups.objectStorage.backend=gcs

In addition, two bucket locations need to be configured, one for storing the backups, and one temporary bucket that is used
when restoring a backup.

--set global.appConfig.backups.bucket=gitlab-backup-storage
--set global.appConfig.backups.tmpBucket=gitlab-tmp-storage

Troubleshooting

Pod eviction issues

As the backups are assembled locally outside of the object storage target, temporary disk space is needed. The required space might exceed the size of the actual backup archive.
The default configuration will use the Toolbox pod’s file system to store the temporary data. If you find pod being evicted due to low resources, you should attach a persistent volume to the pod to hold the temporary data.
On GKE, add the following settings to your Helm command:

--set gitlab.toolbox.persistence.enabled=true

If your backups are being run as part of the included backup cron job, then you will want to enable persistence for the cron job as well:

--set gitlab.toolbox.backups.cron.persistence.enabled=true

For other providers, you may need to create a persistent volume. See our Storage documentation for possible examples on how to do this.

“Bucket not found” errors

If you see Bucket not found errors during backups, check the
credentials are configured for your bucket.

The command depends on the cloud service provider:



  • For AWS S3, the credentials are stored on the toolbox pod in ~/.s3cfg . Run:


    s3cmd ls

  • For GCP GCS, run:


    gsutil ls

You should see a list of available buckets.

“AccessDeniedException: 403” errors in GCP

An error like [Error] AccessDeniedException: 403 <GCP Account> does not have storage.objects.list access to the Google Cloud Storage bucket.
usually happens during a backup or restore of a GitLab instance, because of missing permissions.

The backup and restore operations use all buckets in the environment, so
confirm that all buckets in your environment have been created, and that the GCP account can access (list, read, and write) all buckets:



  1. Find your toolbox pod:


    kubectl get pods -lrelease=RELEASE_NAME,app=toolbox

  2. Get all buckets in the pod’s environment. Replace <toolbox-pod-name> with your actual toolbox pod name, but leave "BUCKET_NAME" as it is:


    kubectl describe pod <toolbox-pod-name> | grep "BUCKET_NAME"

  3. Confirm that you have access to every bucket in the environment:


    # List
    gsutil ls gs://<bucket-to-validate>/

    # Read
    gsutil cp gs://<bucket-to-validate>/<object-to-get> <save-to-location>

    # Write
    gsutil cp -n <local-file> gs://<bucket-to-validate>/
Read article