Python: Memory Management

26.07.2024

cover

If you want to write better code in Python, it's important to understand this topic well.

This post might have some inaccuracies. Please feel free to research and explore on your own!

Garbage Collection

In English, "Garbage Collection" is the process of collecting unreferenced objects that are no longer in use. But when can something be considered garbage? It's simple—when you can no longer use an object or value, you consider it garbage. This process exists in programming languages as well. When a value or object is no longer usable, the Garbage Collector (GC) takes it and removes it from memory (deallocates it).

But why do we need a Garbage Collector? Without it, memory would fill up quickly. If you don't want your computer's memory to look like the image above, then the Garbage Collector is essential 🙃.

Reference Count

As humans, it's easy to understand garbage. You buy a bottle of water, drink it, and know that the bottle is no longer useful to you—it's now garbage. In programming, after a value or object is created, its reference count is calculated, which means how many variables are pointing to it. For example:

# An object 1 is created in memory, but the computer doesn't know how to get it, so it deletes it immediately
>>> 1
# Variable x points to 1 in memory
>>> x = 1
# x holds the memory address and when you print x, it will show 1
>>> print(x)
# When you delete the variable, the reference to 1 is gone, and 1 becomes garbage.
del x

You might think variables store values, but they actually just provide references to those values. When you run print(x), x doesn’t give the value directly; it just tells where to find that value in memory, and the program fetches it for you. So from now on, think of variables as references.

The sys library is used for system-related tasks in Python. The function sys.getrefcount() can show how many references an object has. However, this function always returns one extra reference count because it also points to the object to get the reference count. So, subtract 1 from the result to get the actual count.

import sys

x = [4, 3, 2, 1]
sys.getrefcount(x) # 2

y = x
sys.getrefcount(y) # 3

z = y
sys.getrefcount(z) # 4

# If you modify z, both x and y will change because they all point to the same object.
z.append(0)
print(x, y, z, sep="\n")
# [4, 3, 2, 1, 0]
# [4, 3, 2, 1, 0]
# [4, 3, 2, 1, 0]

# If you set z to a new value or delete it:
z = 1
# or
del z
sys.getrefcount(y) # 3

Did you know that Python compares two objects in constant time, O(1)? Yes, you heard that right. Python doesn't compare the values of objects; it compares their memory addresses. That's it!

Finalizer

A Finalizer is a mechanism that allows you to define cleanup actions to be performed when an object is destroyed (by the Garbage Collector). This can be done using the __del__ method or the weakref.finalize function. In simple terms, the del object statement triggers this process. However, Python doesn't guarantee that the object will be deleted when you use del. [python doc]

class Node:
    def __init__(self, val):
        self.val = val
        self.next = None

x = Node(1)
y = Node(2)

x.next = y
y.next = x

Objects can reference each other, as shown in the code above, especially in data structures like Linked Lists, Graphs, or similar. This situation is known as cyclic reference. The Garbage Collector might struggle to detect and delete these cyclic references, which can lead to a memory leak. To handle this, you can use either the __dict__ method or the weakref.finalize() function. The obj.__dict__ method provides all the attributes of an object, allowing you to delete them automatically. CPython typically follows the __dict__ approach.

One of the worst cases of cyclic references happens when a list appends itself, creating an infinite loop. Initially, it might appear to have only 8 items, but as you dig deeper, the number keeps growing.

arr = []
arr.append(arr)

Conclusion

Stack

As seen in the error message above, you might be curious about why such an error occurs. Python uses a Stack data structure for code execution. The golden rule of the Stack is LIFO (Last In, First Out). When the code is executed, it pops the last item from the stack, and if any error occurs, it shows the sequence to help debug. Quite interesting, right? 😉

Additionally, research weak references, Garbage collector graph traversal algorithms, and Stack in Python, as they play crucial roles in memory management. I hope this post has been helpful in your understanding of Python's memory management.

If you find any errors in this post, feel free to contact me: [email protected]


Let me know if you'd like any changes or adjustments!