Git GC
Overview
In this article, we will learn about Git gc. So, before getting started with the topic, let's get a short overview of Git gc.
The git gc command is one of the most important commands, which we use to maintain the repository. The gc in git gc stands for garbage collection. The main purpose of git gc is to clear up all the mess created in the current repository, and we can achieve this just by running the git gc command. There are numerous of tasks, such as cleaning up unreachable objects, and mistakenly produced by the previous git add commands, packing refs, trimming reflog, etc. we can execute with the help of the git gc command in our repository.
Pre-requisites
Before getting started with the topic, you must have a clear understanding of a few topics like:
Introduction to Git Gc
The git gc command is one of the most important commands, which we use to maintain the repository. The gc in git gc stands for garbage collection. The main purpose of git gc is to clear up all the mess created in the current repository, and we can achieve this just by running the git gc command. Programming languages that allow dynamic allocation of memory, have originated the concept of garbage collection. The interpreted programming languages use garbage collection to clean up the memory, that has been occupied to execute the program.
Git repositories gather several kinds of garbage. One of them is inaccessible or orphaned commits. Sometimes, when history-changing commands such as git rebase or git resets are executed, the commits might get inaccessible. Now, to protect history and prevent data loss, Git won't remove detached commits. Even after being detached, a commit may still be cherry-picked, checked out, and reviewed in the git log.
Other than cleaning up the detached commits, the git gc command also performs tasks like compressing the Git objects stored in the disc, to free up useful disk space (helping in improving the overall performance). Whenever Git notices a collection of similar objects, it compresses the objects in the form of a pack. Packs are similar to zip files of Git objects and they are contained in the ./git/objects/pack directory of any repository.
Numerous tasks, such as cleaning up unreachable objects, and mistakenly produced by the previous git add commands, packing refs, trimming reflog, stale working trees, or revering metadata are also executed with the help of the git gc command in our repository.
What Does Git Gc Do?
Whenever you execute the git gc command in your repository, several subcommands such as git repack, git prune, git rerere, and git pack will be executed internally. The major function of these subcommands is to recognize any Git object that is outside the threshold limits established by the git gc configuration. Once these subcommands recognize the objects, they pruned or compress them accordingly.
Git Gc Config
Let us now learn about git gc configurations.
Explanation:
This variable is optional, with 90 days as the default value. The main purpose of this variable is to define the period for which records in a branch's reflog ought to be retained.
Explanation:
This variable is optional, with 30 days as the default value. The main purpose of this variable is to define the period for which unreachable reflog records ought to be retained.
Explanation:
This variable is optional, with 250 days as the default value. The main purpose of this variable is to manage the time that is consumed during the delta compression phase of the object packing when the git gc command is run along with the option --aggressive. As this can be a time taking process, the effect of strong commands is usually long-lasting.
Explanation:
This variable is optional, with 50 days as the default value. The main purpose of this variable is to describe the level of compression that git-repack has used while the execution of the git gc command along with the --aggressive option.
Explanation:
This variable is optional, with “two weeks ago” as the default value. The main purpose of this variable is to decide the period till which the unreachable object will be preserved before being removed.
Explanation:
This variable is optional, with “three months ago” as the default value. The main purpose of this variable is to determine the period till which the stale functioning tree will be preserved before being pruned.
Explanation:
This variable is optional, with a 6700 as the default value. We use this command to pack multiple inaccessible objects. This command is occasionally used by many Porcelain commands to carry out a quick garbage collection.
If you set this command to 0, the automatic packing depending on the total number of inaccessible objects will be disabled. Along with that some other git gc --auto commands like gc.autoPackLimit will determine the work.
Explanation:
This variable is optional, with a 50 as the default value. When the repository consists of this many packs without the *.keep mark, it combines them into a single bigger pack. This will be disabled if we set it to 0. This will also be disabled if we set the gc.auto to 0.
See the config variable gc.bigPackThreshold below. It will change how the auto pack limit operates while in use.
Explanation:
It returns the git gc --auto command immediately and executes in the background if supported by the system. The default value of this variable is true.
Explanation:
If git-gc[1] is executed, and if this variable is true, the commit-graph file will be modified by gc. When the git gc --auto command is executed, the commit-graph will be modified, if garbage collection is needed. The default value of this variable is also true. Please refer to git-commit-graph[1] for understanding it in detail.
Explanation:
If the git gc command is executed, and if this variable value is non-zero, the packs that are bigger than this limit are preserved. This variable is almost similar to the --keep-largest-pack, the only difference is not only the bigger packs but also those packs that fulfill the threshold are preserved. The default value of this variable is zero. It supports the normal unit suffixes of m, g, or k.
Keep in mind that this config variable is not considered if the number of retained packs exceeds gc.autoPackLimit, and all packs other than the base pack will be repacked. Later, the total packs should go under gc.autoPackLimit, and gc.bigPackThreshold should once again be respected.
The biggest pack will also be omitted, if gc.bigPackThreshold is not enabled and the total memory required for the git repack to execute is unavailable. (it is similar to executing the git gc --keep-largest-pack command).
Explanation:
The git gc --auto will exit by printing its content with a status code zero rather than running if the gc.log file exists unless the file is greater than gc.logExpiry old. The default value of this variable is "one day".
Explanation:
Rather than keeping unreachable objects loose, it keeps them in a cruft pack (see git-repack[1]). The default value of this variable is false.
Explanation:
When git pack-refs is used in a repository, Git versions earlier than 1.5.1.2 cannot clone it via simple transports like HTTP. This parameter controls whether or not git gc executes git pack-refs. This may be adjusted to a boolean value or not bare to enable it in all non-bare repositories. The default value of this variable is true.
Git Gc Best Practices
Garbage collection is automatically executed, if we run several commonly used commands like:
- git merge
- git pull
- git commit
- git rebase
The amount of activity in a repository determines how often the git gc command should be explicitly run. Git gc will need to be run far less frequently on a repository with a sole contributing programmer than it will on a regularly updated multi-user repository.
What is Git Gc Aggressive?
We can execute the git gc command along with the --aggressive option as git gc --aggressive. The main purpose of using this option is to make git gc focus more on optimizing memory usage. Although this --aggressive option will slower the execution of the git gc command as a result, more disc space is saved after it is finished. The consequences of --aggressive are long-lasting, therefore you should only use it after a significant number of modifications have been made to the repository.
How is Git Prune Different from Git Gc?
git prune are the child command and its parent is the git gc command. git prune is internally executed by the git gc command. The main purpose of the git prune command is to clean all the objects of Git that are made unreachable by the configuration of the git gc command. You can learn more about git prune
What is Git Gc Auto?
If the git gc --auto command is executed, the first thing it will do before running is to verify if garbage collection is required in the repository or not. If not then it will exit without doing any task. The git gc --auto command is executed automatically by some of the Git commands after their execution, to remove any leftover unreachable objects.
git gc --auto will examine the git configuration before running to see if there are any thresholds for packing compression size and loose objects. Git config can be used to set these values. git gc --auto will run if the repository exceeds any of the garbage collection thresholds.
Git Gc Options
Let us now discuss about different options of the git gc command in detail.
Explanation:
Generally, the git cg command is executed very fast while giving optimal disk space usage and performance. However, this --aggressive option will make git gc focus more on optimizing memory usage. Although this --aggressive option will slower the execution of the git gc command as a result, more disc space is saved after it is finished. The result of this memory optimization is long-lasting.
Explanation:
If the git gc --auto command is executed, the first thing it will do before running is to verify if garbage collection is required in the repository or not. If not then it will exit without doing any task. The git gc --auto command is executed automatically by some of the Git commands after their execution, to remove any leftover unreachable objects.
Git gc —auto will examine the git configuration before running to see if there are any thresholds for packing compression size and loose objects. Git config can be used to set these values. git gc --auto will run if the repository exceeds any of the garbage collection thresholds.
Explanation:
Rather than keeping unreachable objects loose, it keeps them in a cruft pack.
Explanation:
The main purpose of this option is to remove those unreachable objects, that have exceeded the time period (default value is "two weeks ago", however, you can customize this with the help of the gc.pruneExpire config variable). We can immediately remove the unreachable objects without caring about their age with the help of the --prune=now option. But it might corrupt the process if a different process is simultaneously working on the repository.
Explanation:
If we use this option along with the git gc command, it will not remove the unreachable objects.
Explanation:
If we use this option along with the git gc command, it will forcefully execute the git gc command without caring about any other git gc command executing on the repository at the same time.
Explanation:
All the progress reports are suppressed with this option.
Explanation:
If you use this option along with the git gc command, all the packs other than the biggest pack and also those packs that are labeled with .keep files are combined into one pack. gc.bigPackThreshold is disregarded when this option is applied.
Conclusion
In this article, we learned about the git gc command. Let us recap the points we discussed throughout the article:
- The git gc command is one of the most important commands, which we use to maintain the repository.
- The gc in git gc stands for garbage collection. The main purpose of git gc is to clear up all the mess created in the current repository, and we can achieve this just by running the git gc command.
- There are numerous tasks, such as cleaning up unreachable objects, and mistakenly produced by the previous git add commands, packing refs, trimming reflog, etc. we can execute with the help of the git gc command in our repository.
- Whenever you execute the git gc command in your repository, several subcommands such as git repack, git prune, git rerere, and git pack will be executed internally.
- Some of the git gc configurations are: gc.reflogExpire, gc.reflogExpireUnreachable, gc.aggressiveWindow, gc.auto, etc.
- The main purpose of using the git gc --aggressive option is to make git gc focus more on optimizing memory usage. Although, this --aggressive option will slower the execution of the git gc command as a result, more disc space is saved after it is finished.
- git prune is the child command and its parent isgit gc` is the command.
- The main purpose of the git prune command is to clean all the objects of Git that are made unreachable by the configuration of the git gc command.
- git gc --auto will examine the git configuration before running to see if there are any thresholds for packing compression size and loose objects.
- Some of the git gc configurations are: --aggressive, --auto, --cruft, --prune=<date>, --no-prune, etc.