Home [Fluent Python] Ch6. Object References, Mutability, Recycling
Post
Cancel

[Fluent Python] Ch6. Object References, Mutability, Recycling

Table of Contents

1. Variables are not boxes, but labels

  • Variables are not boxes, but merely labels attached to objects.
  • b = a does not copy the contents of a into b, but attaches the label b to the object that already has the label a.
    1
    2
    3
    4
    5
    6
    
      a = [1, 2, 3]
      b = a
      a.append(4)
      print(b)
    
      # Result: [1, 2, 4, 5]
    
  • Instead of word “assignment”, it’s better to say that x = ... binds the name x to the whatever object created in the right-hand side.
  • An object is created before the variable is bound to it. See person id is printed then the error occurs so the variable x is not even created.
    1
    2
    3
    4
    5
    
      class Person:
      def __init__(self):
          print(f"Person id: {id(self)}")
    
      x = Person() + 10
    

    Alt text

  • Because variables are just labels, an object can have multiple labels bound to it. This is called aliasing.

2. Identity, Equality, and Aliases

1
2
3
4
5
6
7
8
9
jason = {"name": "Jason Lee", "age": 27}
jack = jason
peter = {"name": "Jason Lee", "age": 27}

print(jason == jack)  # Equality True
print(jason is jack)  # Identity True
print(id(jason) == id(jack)) # Same
print(jason == peter)  # Equality True
print(jason is peter)  # Identity False
  • jack is an alias for jason. They’re bound to the exact same object. This is why jason is jack evaluates to true.
  • peter happens to have the exact same value as jason. They’re bound to completely different objects.
  • id() returns an integer representing an object’s unique identity. This id never changes throughout the object’s life-cycle.
    • id(jason) == id(jack) proves that they’re bound to the same object.

    The real meaning of id() is implementation-dependent. In CPython, id() returns the memory address of the object but it could be something else in other interpreters.

    • In practice, use is operator which compares the object IDs, rather than comparing id().
  • is operator compares the object identity, while == operator compares the object values.
    • jason is peter is False, but jason == peter is True since they hold the same value.

    Actually == operator evaluates __eq__ special method of the class (in this case, dict)

3. Choosing between == and is

  • == operator compares values while is operator compares their identities
  • When we write codes, we often care more about values than identities.
  • However, when comparing a variable to a singleton, use is operator.
    • The most common case is to check whether the variable is bound to None.
    • Another use case is sentinel objects.
  • is operator is faster than == as it doesn’t require special methods to evaluate it.
  • a == b is just a syntactic sugar for a.__eq__(b).
    • __eq__ is inherited from object (base object for all objects) and compares identity. However, most built-in types override this with more meaningful implementations that take values into account.

In most cases, we’re interested in object equality rather than identity. Checking for None is the only common use case we use is. Besides that, always use ==

4. The Relative Immutability of Tuple

1
2
3
4
5
tup = (1, 2, [5, 6, 7])
print(id(tup[-1]))  # 123456
tup[-1].append(99999)
print(tup)  # (1, 2, [5, 6, 7, 99999])
print(id(tup[-1]))  # 123456

The identity of the item the tuple contains can never change even if it’s mutated. And this is what really “tuple is immutable” means.

  • We know that tuple is an immutable object, holding references to objects.
  • If the referenced items are mutable, they may change even if the tuple itself is immutable.
    • In other words, immutability does not extend to the referenced objects.
  • If all the nested items a tuple is holding are immutable, then the tuple is hashable.
    1
    2
    3
    4
    5
    
      tup1 = (1, 2, (5, 6)) # all immutable
      tup2 = (1, 2, [5, 6]) # contains mutable obj
    
      print(f"tup1 hashed: {hash(tup1)}")
      print(f"tup2 hashed: {hash(tup2)}") # error
    

    Alt text

5. Shallow vs Deep Copy

Shallow Copy

Shallow copy is a copy where the outermost container is duplicated, but the copy is filled with references to the same items hold by the original container.

This saves memory and causes no problem if all the items are immutable, but if there’re mutable items, it might cause problems

  • 1. Shallow copy by constructor
    1
    2
    3
    4
    5
    
    l1 = [3, [[1, 2, 3], 44], (7, 8, 9)]
    l2 = list(l1) # copy by constructor
    print(l2 == l1) # True
    print(l2 is l1) # False
    print(l2[1][0] is l1[1][0]) # True
    
    • We can see that l2 and l1 hold the same value but they refer to different objects
    • l2[1][0] is l1[1][0] shows that the shallow copy does not copy nested objects.
  • 2. Shallow copy by [:]
    1
    2
    3
    4
    5
    
    l1 = [3, [[1, 2, 3], 44], (7, 8, 9)]
    l2 = l1[:] # copy by [:]
    print(l2 == l1) # True
    print(l2 is l1) # False
    print(l2[1][0] is l1[1][0]) # True
    
  • 3. Shallow copy by copy function
    1
    2
    3
    4
    5
    6
    7
    
      from copy import copy
    
      l1 = [3, [[1, 2, 3], 44], (7, 8, 9)]
      l2 = copy(l1) # copy by copy function
      print(l2 == l1) # True
      print(l2 is l1) # False
      print(l2[1][0] is l1[1][0]) # True
    

Deep Copy

Sometimes we need to make deep copies (i.e. duplicates that do not share references of the nested/embedded objects as well).

The copy modules provides deepcopy function that returns deep copies of arbitrary objects.

1
2
3
4
5
6
7
8
9
10
11
12
class Bus:
    def __init__(self, passengers=None):
        if passengers is None:
            self.passengers = []
        else:
            self.passengers = list(passengers)
    
    def pick(self, name):
        self.passengers.append(name)
    
    def drop(self, name):
        self.passengers.remove(name)

Let’s look at the effects of shallow copy and deep copy.

Shallow Copy

1
2
3
4
5
6
7
import copy
bus1 = Bus(['Alice', 'Bill', 'Claire', 'David'])
bus2 = copy.copy(bus1)
print(bus1 is bus2) # False
bus1.drop('Bill') 
print(bus2.passengers) # ['Alice', 'Claire', 'David']
print(bus1.passengers is bus2.passengers) # True

We can see that bus1 and bus2 are different objects but dropping Bill for bus1 also has an effect on the bus2. Moreover, bus1 and bus2 indeed share the same passengers list.

Deep Copy

1
2
3
4
5
6
7
import copy
bus1 = Bus(['Alice', 'Bill', 'Claire', 'David'])
bus2 = copy.deepcopy(bus1)
print(bus1 is bus2) # False
bus1.drop('Bill') 
print(bus2.passengers) # ['Alice', 'Bill', 'Claire', 'David']
print(bus1.passengers is bus2.passengers) # False

On the other hand, using deepcopy makes dropping have no effect on the bus2 object. Bus1 and Bus2 also have different passengers list.

Cyclic Reference

  • Making deep copies is not a simple matter. Objects may have cyclic references that would lead to an infinite loop.
    1
    2
    3
    4
    
      a = [10, 20]
      b = [a, 30]
      a.append(b)
      print(a)
    
  • deepcopy function handles this cyclic references gracefully.

6. Function Parameters as References

  • The only mode of parameter passing in Python is call by sharing.
  • Call by sharing means that each formal parameter of the function gets a copy of each reference in the arguments.
  • The result of this mode is that a function may change any mutable object passed as a parameter.

Let’s this how the objects passed as parameters behave for number, list, and tuple.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
def f(a, b):
    a += b
    return a

x = 1
y = 2
print(f(x, y)) # 3
print(x, y) # 1, 2

a = [1, 2]
b = [3, 4]
print(f(a, b)) # [1, 2, 3, 4]
print(a, b) # [1, 2, 3, 4], [3, 4]

t = (10, 20)
u = (30, 40)
print(f(t, u)) # (10, 20, 30, 40)
print(t, u) # (10, 20), (30, 40)
  • number x is unchanged
  • list a is changed
  • tuple t is unchanged

7. Mutable Types as Parameter Defaults: Bad Idea

  • Avoid mutable objects as default values for parameters
1
2
3
4
5
6
7
def f(lst = []):
    lst.append(1)
    print(lst)
    return lst

f() # [1]
f() # [1, 1]

When we execute the first f(), 1 is appended to the empty list and [1] is printed. However, a strange thing happens when we execute the second f(). Instead of [1], we see that [1, 1] gets printed.

  • The problem is that each default value is evaluated when the function is defined (i.e. when module is loaded)
  • The default values become attributes of the function object
  • So if a default value is a mutable object, and you change it, the change will affect every future call.
  • We can check the defaults of a function with f.__defaults__.

8. del keyword and Garbage Collection

In Python, objects are never explicitly destroyed. Instead, when they become “unreachable”, they maybe garbage-collected.

In CPython, the Python interpreter tracks refcount referring to the how many times an object is referenced by. As I’ll talk about later posts, this is related to the infamous GIL(Global Interpreter Lock). When the refcount of an object becomes 0, the object is garbage-collected by the garbage collector and frees up its memory.

An important thing to note about del keyword is that it deletes references, not objects. In other words, del my_obj statement does not always delete the object out of memory, but it de-references the variable named my_obj to the object that it was pointing at. If my_obj happened to be the last reference, then the Python’s garbage collector would delete the object out of memory.

Also, rebinding a variable also decreases the refcount, may leading to the destruction of an object.

1
2
3
4
5
a = [1, 2]
b = a
del a
print(b) # out: [1, 2]
b = [3] # Object discarded by the garbage collector

In the above example, the list object [1, 2] was referenced by two variables a and b. Then, the reference by a is not deleted and b rebinds itself to another list object. Consequently, the refcount of the original list object reaches 0, causing the destruction of the object.

weakref.finalize

To demonstrate the destruction of an object, we can use weakref.finalize as a callback function to be called when an object is destroyed.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import weakref

def bye():
    print("Goodbye..")

a = {1, 2, 3}
b = a

ender = weakref.finalize(a, bye)
print(ender.alive) # out: True

del a # refcount = 1
b = "hello world!" # out: Goodbye...

print(ender.alive) # out: False

We can see that when we rebind the variable b to another object, the refcount of the original set hits 0, triggering the callback function.

This post is licensed under CC BY 4.0 by the author.
Trending Tags