What is Garbage Collection in Python?
Garbage collection or GC is a technique used in programming languages for memory recovery. The garbage collection comes as a built-in feature in various programming languages like Python, C#, Java, etc. In other programming languages like C and Cpp (C++), the programmer or the developer need to manually allocate and de-allocate the memory.
A garbage collection in Python manages the memory automatically and heap allocation (Refer here). In simpler terms, the process of automatic deletion of unwanted or unused objects to free the memory is called garbage collection in Python. It also frees the dead memory and reclaims back the block of memory no longer in use for further usage.
The garbage collector in Python starts running as soon as the program's execution starts. Whenever the object's reference counter reaches 0, the garbage collector is triggered. An object's reference counter changes when the number of aliases to the object (other references of the same object) changes.
There are two scenarios when an object's reference counter increase:
- Whenever a new name or alias is assigned to an object, its reference count increases.
- When an object is inserted or placed into containers such as lists, tuples, or dictionaries, the reference count of the object increases.
On the other hand, an object's reference counter decreases when the object is deleted (with del). Whenever the reference count of an object reaches 0, the garbage collector in Python collects it automatically. We can also say that the object's reference goes out of scope.
When the reference counter reaches 0, there is no one to refer to that instance of the memory location. So, such instances of the memory which is no longer referenced by an object are known as orphaned instances. So, the garbage collector reclaims that instance of memory so that it can be allocated to other objects in the future.
Before deleting any object, the Python interpreter invokes the destructor of a class called __del__(). The destructor is invoked when the instance of the class (object) is about to be destroyed. The destructor may be used to clean any other non-memory resources used by the objects.
Let us take an example to visualize how the counter (reference counter) increases and decreases.
Why Do We Need It?
The main aim of garbage collection in Python is to reduce memory leaks. A garbage collector also makes sure of memory safety. In doing so, a garbage collector hides the underlying complexity of memory allocation, de-allocation, and raw pointers (memory locations).
A garbage collector can be considered similar to a memory manager of an operating system. It keeps track of allocated, de-allocated, and unused memory. With the being of garbage collection in Python, a developer need not worry about deleting an object to free the memory when it is no longer in use.
For a better understanding of the need for a garbage collector, let us understand the problem that we can face if we do not use a garbage collection in Python:
- Forgetting to free the memory assigned: A variable or an object acquires some memory. When the usage of the object is done, and we have forgotten to free the acquired memory then it may result in memory leaks. The memory leaks may lead our program to acquire a lot of memory than the usual requirement over time. If our program is a long-running program, then it is a major concern. If we have a garbage collector then it will keep track of such unused memory and manages it accordingly.
- Freeing the memory in the early stage: Another type of problem that may arise is if we have freed the memory while it was still in use. So, when our program tries to access a certain value in the memory that no longer exists, then it can corrupt our data.
So, for freeing the memory and for the safety of memory, we need an algorithm named garbage collection in Python that runs in parallel when a program is executed.
Refer to the following sections to learn more about various ways to implementation of garbage collection in Python.
How Python Implements Garbage Collection?
Python uses two strategies to implement garbage collection. The first one is Reference Counting, and the other one is the Generational Garbage Collection.
As we have discussed whenever the reference count of an object reaches 0, it is freed by the garbage collector. Before the introduction of Python 2.0, the Python interpreter only used the reference counter to maintain memory management. Our program cannot disable this reference counting as it works in the background for every program.
We can use the sys module of Python to check the reference count of a certain object. The sys module has a method called getrefcount(), which is used to view the reference count of a certain object.
Example:
Output:
Apart from the reference counter, Python uses another garbage collection strategy called Generational Garbage Collector.
In the case of a reference counter, when we try to add an object to itself, it leads to a cyclical reference or reference cycle (an object referring to itself). So a reference counter cannot destroy the object as the object's reference counter could never reach 0 (due to cycle). So, Generational Garbage Collector came into the picture.
A Generational Garbage Collector is found in the gc module of the standard library, which works as a trace-based garbage collector. Whenever there is a reference cycle, the instance or object doesn’t have a reference count of zero as it has a reference to itself. The generational garbage collector runs and frees the memory occupied.
Refer to the following sections for more details regarding the Generational Garbage Collector with various examples.
Ways to Make an Object Eligible for Garbage Collection
We can delete the object from memory or assign a NULL value, i.e. None, to the object to make it eligible for garbage collection.
Let us take an example to understand both contexts.
Example (using del):
Output:
In the above example, we have first incremented the reference count of numbers by referencing it with y. So, first, the reference count was 2 and then 3. After that, when we used del, numbers got deleted and when we tried to access numbers, we got an error.
Example (using None):
Output (first run):
Output (second run):
As we can see that the first two reference counts are the same as the previous example, but when we have referenced numbers with None, its reference count becomes a garbage value (changes on every run). in this scenario, the list numbers is never freed.
Refer to the example provided below for better visualization.
How Does Garbage Collection Work in Python?
In Python, we have reference counting and general garbage collection working together.
As we have discussed whenever the reference count of an object reaches 0, it is freed by the garbage collector. Our program cannot disable this reference counting as it works in the background for every program.
a. Generational Garbage Collection
In the case of a reference counter, when we try to add an object to itself, a cyclical reference or reference cycle is created. So, a reference counter cannot destroy the object as the object's reference counter could never reach 0 (due to cycle). So, we use the general garbage collector in such scenarios. It runs and frees the memory occupied. A Generational Garbage Collector is found in the gc module of the standard library.
1. Automatic Garbage Collection of Cycles
The Python interpreter schedules the garbage collection because discovering the reference cycle is computational work. For scheduling the garbage collection, the Python interpreter detects the object's memory allocation and de-allocation threshold.
The garbage collector is invoked or run when the difference between the number of allocations and the number of de-allocations is greater than the detected threshold number. We can check the threshold of newly created objects by using the gc module's pre-defined function get_threshold().
Example:
Output:
As we can see that the default threshold of the system is , so whenever the difference between the number of allocations and the number of de-allocations is greater than 700, the garbage collector runs. When the number of allocations minus the number of deallocations exceeds 700 (i.e. threshold-0), the collection starts. Initially, only generation-0 is examined. If generation-0 has been examined more than threshold-1 (i.e. 10) times since generation-1 has been examined, then generation-1 is examined as well. Similarly, for the next generation (related to threshold-2).
2. Manual Garbage Collection
Apart from the automatic garbage collection running on hitting the threshold, we can also invoke the garbage collection in Python manually. We can use the collect() method of the gc module to get the number of objects collected and de-allocated by the garbage collector.
Refer to the example specified below for better understanding.
Example:
Output:
We can manually schedule the garbage collection in two ways:
- Time based invoking: We can schedule the garbage collector to run at a specific interval of time.
- Event-based invoking: We can also schedule the garbage collector to run at a particular event, such as when the user logs out, when the database is changed, etc., depending upon the usage.
3. Forced Garbage Collection
There may be situations in real life when the user wants to run the garbage collector for memory management or to free up some unused space. So, the garbage collection in Python allows users to explicitly call the garbage collector using the gc module.
Let us take an example in which we will create a reference cycle and then delete the object manually to check the counter collected by the gc.collect() function.
Output:
4. Disabling Garbage Collection
If we are sure that we have properly managed the memory allocation and de-allocation and our program will not create any reference cycle as well, then we can disable the garbage collector for the specific program.
For disabling the garbage collector, we can invoke the disable() method of the gc module.
Example:
Output:
b. Reference Counting
As we have discussed earlier, the Python interpreter uses reference counting to check if an object should be deleted or not. Whenever the object's reference counter reaches 0, the garbage collector is triggered. An object's reference counter changes when the number of aliases to the object (other references of the same object) changes.
Whenever a new name or alias is assigned to an object, its reference count increases. Or, when an object is inserted or placed into containers such as lists, tuples, or dictionaries, the reference count of the object increases.
Refer to the image below for better clarity.
An object's reference counter decreases when the object is deleted. Whenever the reference count of an object reaches 0, there is no one to refer to that instance of the memory location. So, such instances of the memory which is no longer referenced. Hence, the garbage collector in Python collects it automatically.
Reference Cycle A reference cycle is a scenario in which a no longer used object cannot be deleted by the garbage collection in Python as the reference counter cannot reach 0. So, when an object refers to itself, it leads to a cyclical reference or reference cycle.
Let us take an example to understand the context better.
Output:
Garbage Collector Interface
We can use the gc module of the standard library to interact with the interface of the garbage collector. It provides various methods using which we can perform several actions like disabling the garbage collector, getting the collection count, debugging the program's leaking memory, etc.
Let us look at some of the most important methods of the gc module:
- gc.disable(): It is used for disabling Python's garbage collector.
- gc.enable(): It is used for enabling Python's garbage collector.
- gc.isenabled(): It is used to detect if the garbage collector is enabled or not. It returns either True or False.
- gc.get_count(): It used to get the current collection counts.
- gc.get_threshold(): It is used to get the current collection of thresholds.
You can refer to the official documentation of the Python programming language to get the list of all the functions of the gc module.
Interacting with Python Garbage Collector
Let us code an example in which we will create reference cycles and then destroy that object of circular reference using the various methods of the gc module.
Example:
Output:
Advantages and Disadvantages
Let us learn some of the advantages and disadvantages associated with garbage collection in Python.
Advantages:
- The developer need not worry about the memory de-allocation of the unused objects as the garbage collector is running in the background.
- The garbage collection saves us from the bugs like the dangling pointer problem Refer here.
- There is a lot of reduction in memory leakage due to the presence of garbage collection in Python.
- Garbage collection in Python provides memory safety as well.
Disadvantages: The main disadvantage of using a garbage collection is hampered performance. The garbage collection in Python is an additional process running in the background, so it is an overhead and thus impacts performance.
Learn more
To learn more about Python and other related topics, refer to the articles:
Conclusion
- A garbage collection in Python manages the memory automatically and heap allocation. In simpler terms, the process of automatic deletion of unwanted or unused objects to free the memory is called garbage collection in Python.
- The Python interpreter schedules the garbage collection because discovering the reference cycle is computational work.
- Python uses two strategies to implement garbage collection. The first one is Reference Counting, and the other one is the Generational Garbage Collection.
- Apart from the automatic garbage collection running on hitting the threshold, we can also invoke the garbage collection in Python manually using the gc module.
- The garbage collector in Python starts running as soon as the program's execution starts. Whenever the object's reference counter reaches 0, the garbage collector is triggered.
- Whenever a new name or alias is assigned to an object, its reference count increases. Or, when an object is inserted or placed into containers such as lists, tuples, or dictionaries, the reference count of the object increases.
- An object's reference counter decreases when the object is deleted. Whenever the reference count of an object reaches 0, there is no one to refer to that instance of the memory location.
- The main aim of garbage collection in Python is to reduce memory leaks. A garbage collector also makes sure of memory safety.
- A reference cycle is a scenario in which a no longer used object cannot be deleted by the garbage collection in Python as the reference counter cannot reach 0.
- The garbage collection in Python is an additional process running in the background, so it is an overhead and thus impacts performance.