Guide: Type Hinting in Python
Since version 3.5, Python supports type hints: code annotations that, through additional tooling, can check if you’re using your code correctly.
Introduction
With the release of version 3.5, Python has introduced type hints: code annotations that, through additional tooling, can check if you’re using your code correctly.
Long-time Python users might cringe at the thought of new code needing type hinting to work properly, but we need not worry: Guido himself wrote in PEP 484, “no type checking happens at runtime.”
The feature has been proposed mainly to open up Python code for easier static analysis and refactoring.
For data science–and for the data scientist– type hinting is invaluable for a couple of reasons:
- It makes it much easier to understand the code, just by looking at the signature, i.e. the first line(s) of the function definition;
- It creates a documentation layer that can be checked with a type checker, i.e. if you change the implementation, but forget to change the types, the type checker will (hopefully) yell at you.
Of course, as is always the case with documentation and testing, it’s an investment: it costs you more time at the beginning, but saves you (and your co-worker) a lot in the long run.
Note: Type hinting has also been ported to Python 2.7 (a.k.a Legacy Python). The functionality, however, requires comments to work. Furthermore, no one should be using Legacy Python in 2019: it’s less beautiful and only has a couple more months of updates before it stops receiving support of any kind.
Getting started with types
The hello world of type hinting is
# hello_world.py
def hello_world(name: str = 'Joe') -> str:
return f'Hello {name}'
We have added two type hint elements here. The first one is : str
after name and the second one is -> str
towards the end of the signature.
The syntax works as you would expect: we’re marking name to be of type str
and we’re specifying that the hello_world
function should output a str
. If we use our function, it does what it says:
> hello_world(name='Mark')
'Hello Mark'
Since Python remains a dynamically unchecked language, we can still shoot ourselves in the foot:
> hello_world(name=2)
'Hello 2'
What’s happening? Well, as I wrote in the introduction, no type checking happens at runtime.
So as long as the code doesn’t raise an exception, things will continue to work fine.
What should you do with these type definitions then? Well, you need a type checker, or an IDE that reads and checks the types in your code (PyCharm, for example).
Type checking your program
There are at least four major type checker implementations: Mypy, Pyright, pyre, and pytype:
- Mypy is actively developed by, among others, Guido van Rossum, Python’s creator.
- Pyright has been developed by Microsoft and integrates very well with their excellent Visual Studio Code;
- Pyre has been developed by Facebook with the goal to be fast (even though mypy recently got much faster);
- Pytype has been developed by Google and, besides checking the types as the others do, it can run type checks (and add annotations) on unannotated code.
Since we want to focus on how to use typing from a Python perspective, we’ll use Mypy in this tutorial. We can install it using pip
(or your package manager of choice):
$ pip install mypy
$ mypy hello_world.py
Right now our life is easy: there isn’t much that can go wrong in our hello_world
function. We’ll see later how this might not be the case anymore.
More advanced types
In principle, all Python classes are valid types, meaning you can use str
, int
, float
, etc. Using dictionary, tuples, and similar is also possible, but you need to import them from the typing module.
# tree.py
from typing import Tuple, Iterable, Dict, List, DefaultDict
from collections import defaultdict
def create_tree(tuples: Iterable[Tuple[int, int]]) -> DefaultDict[int, List[int]]:
"""
Return a tree given tuples of (child, father)
The tree structure is as follows:
tree = {node_1: [node_2, node_3],
node_2: [node_4, node_5, node_6],
node_6: [node_7, node_8]}
"""
tree = defaultdict(list)
for child, father in tuples:
if father:
tree[father].append(child)
return tree
print(create_tree([(2.0,1.0), (3.0,1.0), (4.0,3.0), (1.0,6.0)]))
# will print
# defaultdict( 'list'="">, {1.0: [2.0, 3.0], 3.0: [4.0], 6.0: [1.0]}
While the code is simple, it introduces a couple of extra elements:
- First of all, the
Iterable
type for thetuples
variable. This type indicates that the object should conform to thecollections.abc.Iterable
specification (i.e. implement__iter__
). This is needed because we iterate overtuples
in thefor
loop; - We specify the types inside our container objects: the
Iterable
containsTuple
, theTuples
are composed of pairs ofint
, and so on.
Ok, let’s try to type check it!
$ mypy tree.py
tree.py:14: error: Need type annotation for 'tree'
Uh-oh, what’s happening? Basically Mypy is complaining about this line:
tree = defaultdict(list)
While we know that the return type should be DefaultDict[int, List[int]]
, Mypy cannot infer that tree is indeed of that type. We need to help it out by specifying tree’s type. Doing so can be done similarly to how we do it in the signature:
tree: DefaultDict[int, List[int]] = defaultdict(list)
If we now re-run Mypy again, all is well:
$ mypy tree.py
$
Type aliases
Sometimes our code reuses the same composite types over and over again. In the above example, Tuple[int, int]
might be such a case. To make our intent clearer (and shorten our code), we can use type aliases. Type aliases are very easy to use: we just assign a type to a variable, and use that variable as the new type:
Relation = Tuple[int, int]
def create_tree(tuples: Iterable[Relation]) -> DefaultDict[int, List[int]]:
"""
Return a tree given tuples of (child, father)
The tree structure is as follow:
tree = {node_1: [node_2, node_3],
node_2: [node_4, node_5, node_6],
node_6: [node_7, node_8]}
"""
# convert to dict
tree: DefaultDict[int, List[int]] = defaultdict(list)
for child, father in tuples:
if father:
tree[father].append(child)
return tree
Generics
Experienced programmers of statically typed languages might have noticed that defining a Relation
as a tuple of integers is a bit restricting. Can’t create_tree
work with a float, or a string, or the ad-hoc class that we just created?