Naming things, Python dictionary Ed.

I really enjoy using Python. It’s a simple, expressive language with lots of useful libraries. It comports well with my thinking. And I suspect I am not alone: it’s become one of the most popular programming languages.

In this post, I’ll talk about Python dictionaries. In many ways, the dictionary (or “dict”) is the core data structure of Python, analogous to matrices in MATLAB or dataframes in R. Dictionaries are not unique to Python, most languages support them, often calling them associative arrays, hashes, or maps. Here’s an example of a small dictionary:

d = {'a':1, 'b':2, 'c':3}

Here d maps three letters (strings) to numbers (integers), and we can retrieve the integers whenever needed using the strings:

i = d['a']
print(i)

The square bracket “access” operator lets us plug a key into the dictionary and return a value (or an error if the key is not in the dictionary).

Python dictionaries are particularly easy to use, efficient, and flexible. In fact, there’s an awesome chapter in Beautiful Code entitled Python’s dictionary implementation: being all things to all people that I highly recommend if you’re interested in learning about how, and more importantly, why, dictionaries were built the way they were 1. Python is often criticized for being inefficient, as you’d expected for a dynamically typed language. But I am often shocked how much a well-placed dictionary can speed things up; it’s memory access is amazing.

Flexibility

Besides efficiency, another useful property of dictionaries are their flexibility. Because Python is dynamically typed, and everything is an object, you are not limited to dictionaries with homogeneous data types. In other words, instead of making a dictionary like in the example where every key is a string and every value is an integer, you can mix and match data types:

d = { 1:1,
      2:'a',
      'series': [1,2,3,],
      'map': {'a':1, 'b':2, 'c':3}
}

This is a perfectly valid dictionary. It has both keys and values of different types. Values can themselves be data structures, d['series'] is a list and d['map'] is itself a dictionary, nested within d! There are limits on what types of variables can be used as keys: a key must be immutable, so you can’t use a list or a dictionary as a key, even though they are fine as values.

This second example d is what I call a heterogeneous dictionary. The variables, both keys and values, are of mixed types and almost certainly have mixed meanings. Heterogeneous dictionaries are extremely useful, but excessive heterogeneity can make code difficult to understand and potentially unreliable.

Naming things — keep dictionaries homogeneous

I am a strong advocate for keeping dictionaries simple, and in particular homogeneous. Each dictionary should keep track of one thing, or one “kind” of thing. That way you always know what to expect when you access a key. Instead of the heterogeneous, kitchen-sink approach, keep a small, curated collection of dictionaries. Just because you can use heterogeneous dictionaries in Python doesn’t mean you should.

Why is this helpful? For me, it enables a very powerful naming convention.

I almost never name my dictionaries with generics like d or data. Instead, I use a naming convention of key2value 2:

user2age = {}
user2group = {}
group2location = {}

and so forth.

The idea is to never name the dictionary as a singular entity, but always focus on the inputs and outputs of the dictionary. When you keep the dictionary homogeneous in those terms, the dictionary—and your code—become predictable:

for u in user2age:
    a = user2age[u]
    g = user2group[u]
    print("User group age", u, g, a)

At every point in that code, it’s quick to understand what variables are doing. Consider if I had written:

for u in user2age:
    a = user2age[u]
    g = user2group[a] # !!!

It’s much more obvious that I made a mistake, because at a glance I can see that a is not a user but it’s going into a dict that takes users as keys. Now imagine that users are integers (say, ID numbers), and ages are also integers. That bug may not actually throw a key error: if there exists a user called 42 and one or more users happen to have an age of 42, then user2group[a] will return a valid value. The dictionary doesn’t know when 42 means user ID 42 and when 42 means 42 years. A subtle bug has been avoided.

This naming convention is especially helpful with nested dictionaries:

grade = group2student2grade[grp][stu]

When you have a dictionary-of-dictionaries you can call into the outer and inner dictionary simultaneously using two [] operators in a row. By extending the naming convention so the value in the outer dictionary is another name of the form key2value, not only do we immediately know we have nested dictionaries, we can also quickly see if the outer and inner keys have been passed correctly: does [grp][stu] match up with group2student? Yup. So we are good to go, assuming of course that grp and stu are correct.

It’s all about the names

These examples underscore why naming variables carefully is so crucial to readable, maintainable, predictable code. Both for k in D and for u in user2name can be exactly the same, but the former is so generic it can be hard to detect issues, while the latter gives more “hooks” to hang our understanding on. It doesn’t solve all problems 3, but it helps!


  1. I suspect some of the specifics are now out-of-date. Python dict’s have changed considerably in the past few years, especially now that they preserve insertion order of keys. But the general idea from that chapter should still hold. ↩︎

  2. Yes, I am aware of word2vec. Despite claims to the contrary, they, nor I, invented the naming convention input2output. See also Hungarian notation, which has advantages and disadvantages. ↩︎

  3. And no, neither does a statically typed language. Python forever! ↩︎

Jim Bagrow
Jim Bagrow
Associate Professor of Mathematics & Statistics

My research interests include complex networks, computational social science, and data science.

Next
Previous

Related