TS
Sunday, 10 October 2021
Understanding all of Python,
through its builtins
@tusharsadhwani
Python as a language is comparatively simple. And I believe, that
you can learn quite a lot about Python and its features, just by
learning what all of its builtins are, and what they do. And to back
up that claim, I’ll be doing just that.
Just to be clear, this is not going to be a tutorial post. Covering
such a vast amount of material in a single blog post, while starting
from the beginning is pretty much impossible. So I’ll be assuming
you have a basic to intermediate understanding of Python. But
other than that, we should be good to go.
Index
• Index
• So what’s a builtin?
• Local scope
• Enclosing scope
• Global scope
• Builtin scope
• ALL the builtins
• Exceptions
• Constants
• Funky globals
• __name__
• __doc__
• __package__
• __spec__
• __loader__
• __import__
• __debug__
• __build_class__
• __cached__
• All the builtins, one by one
• compile , exec and eval : How the code works
• globals and locals : Where everything is stored
• input and print : The bread and butter
• str , bytes , int , bool , float and complex : The five
primitives
• object : The base
• type : The class factory
• hash and id : The equality fundamentals
• dir and vars : Everything is a dictionary
• hasattr , getattr , setattr and delattr : Attribute
helpers
• super : The power of inheritance
• property , classmethod and staticmethod : Method
decorators
• list , tuple , dict , set and frozenset : The containers
• bytearray and memoryview : Better byte interfaces
• bin , hex , oct , ord , chr and ascii : Basic conversions
• format : Easy text transforms
• any and all
• abs , divmod , pow and round : Math basics
• isinstance and issubclass : Runtime type checking
• callable and duck typing basics
• sorted and reversed : Sequence manipulators
• map and filter : Functional primitives
• len , max , min and sum : Aggregate functions
• iter and next : Advanced iteration
• range , enumerate and zip : Convenient iteration
• slice
• breakpoint : built-in debugging
• open : File I/O
• repr : Developer convenience
• help , exit and quit : site builtins
• copyright , credits , license : Important texts
• So what’s next?
• The end
So what’s a builtin?
A builtin in Python is everything that lives in the builtins module.
To understand this better, you’ll need to learn about the L.E.G.B.
rule.
^ This defines the order of scopes in which variables are looked up
in Python. It stands for:
• Local scope
• Enclosing (or nonlocal) scope
• Global scope
• Builtin scope
Local scope
The local scope refers to the scope that comes with the current
function or class you are in. Every function call and class
instantiation creates a fresh local scope for you, to hold local
variables in.
Here’s an example:
x = 11
print(x)
def some_function():
x = 22
print(x)
some_function()
print(x)
Running this code outputs:
11
22
11
So here’s what’s happening: Doing x = 22 defines a new variable
inside some_function that is in its own local namespace. After
that point, whenever the function refers to x , it means the one in
its own scope. Accessing x outside of some_function refers to
the one defined outside.
Enclosing scope
The enclosing scope (or nonlocal scope) refers to the scope of the
classes or functions inside which the current function/class lives.
… I can already see half of you going 🤨 right now. So let me
explain with an example:
x = 11
def outer_function():
x = 22
y = 789
def inner_function():
x = 33
print('Inner x:', x)
print('Enclosing y:', y)
inner_function()
print('Outer x:', x)
outer_function()
print('Global x:', x)
The output of this is:
Inner x: 33
Enclosing y: 789
Outer x: 22
Global x: 11
What it essentially means is that every new function/class creates
its own local scope, separate from its outer environment. Trying
to access an outer variable will work, but any variable created in
the local scope does not affect the outer scope. This is why
redefining x to be 33 inside the inner function doesn’t affect the
outer or global definitions of x .
But what if I want to affect the outer scope?
To do that, you can use the nonlocal keyword in Python to tell the
interpreter that you don’t mean to define a new variable in the
local scope, but you want to modify the one in the enclosing scope.
def outer_function():
x = 11
def inner_function():
nonlocal x
x = 22
print('Inner x:', x)
inner_function()
print('Outer x:', x)
This prints:
Inner x: 22
Outer x: 22
Global scope
Global scope (or module scope) simply refers to the scope where
all the module’s top-level variables, functions and classes are
defined.
A “module” is any python file or package that can be run or
imported. For eg. time is a module (as you can do import time
in your code), and [Link]() is a function defined in time
module’s global scope.
Every module in Python has a few pre-defined globals, such as
__name__ and __doc__ , which refer to the module’s name and
the module’s docstring, respectively. You can try this in the REPL:
>>> print(__name__)
__main__
>>> print(__doc__)
None
>>> import time
>>> time.__name__
'time'
>>> time.__doc__
'This module provides various functions to manipulate time val
Builtin scope
Now we get to the topic of this blog — the builtin scope.
So there’s two things to know about the builtin scope in Python:
• It’s the scope where essentially all of Python’s top level
functions are defined, such as len , range and print .
• When a variable is not found in the local, enclosing or global
scope, Python looks for it in the builtins.
You can inspect the builtins directly if you want, by importing the
builtins module, and checking methods inside it:
>>> import builtins
>>> builtins.a # press <Tab> here
[Link]( [Link]( [Link]( [Link]
And for some reason unknown to me, Python exposes the builtins
module as __builtins__ by default in the global namespace. So
you can also access __builtins__ directly, without importing
anything. Note, that __builtins__ being available is a CPython
implementation detail, and other Python implementations might
not have it. import builtins is the most correct way to access
the builtins module.
ALL the builtins
You can use the dir function to print all the variables defined
inside a module or class. So let’s use that to list out all of the
builtins:
>>> print(dir(__builtins__))
['ArithmeticError', 'AssertionError', 'AttributeError', 'BaseE
'BlockingIOError', 'BrokenPipeError', 'BufferError', 'BytesWa
'ChildProcessError', 'ConnectionAbortedError', 'ConnectionErr
'ConnectionRefusedError', 'ConnectionResetError', 'Deprecatio
'EOFError', 'Ellipsis', 'EnvironmentError', 'Exception', 'Fal
'FileExistsError', 'FileNotFoundError', 'FloatingPointError'
'FutureWarning', 'GeneratorExit', 'IOError', 'ImportError',
'ImportWarning', 'IndentationError', 'IndexError', 'Interrupt
'IsADirectoryError', 'KeyError', 'KeyboardInterrupt', 'Lookup
'MemoryError', 'ModuleNotFoundError', 'NameError', 'None',
'NotADirectoryError', 'NotImplemented', 'NotImplementedError
'OSError', 'OverflowError', 'PendingDeprecationWarning',
'PermissionError', 'ProcessLookupError', 'RecursionError',
'ReferenceError', 'ResourceWarning', 'RuntimeError', 'Runtime
'StopAsyncIteration', 'StopIteration', 'SyntaxError', 'Syntax
'SystemError', 'SystemExit', 'TabError', 'TimeoutError', 'Tru
'TypeError', 'UnboundLocalError', 'UnicodeDecodeError',
'UnicodeEncodeError', 'UnicodeError', 'UnicodeTranslateError
'UnicodeWarning', 'UserWarning', 'ValueError', 'Warning',
'ZeroDivisionError', '__build_class__', '__debug__', '__doc__
'__import__', '__loader__', '__name__', '__package__', '__spe
'abs', 'all', 'any', 'ascii', 'bin', 'bool', 'breakpoint', 'b
'bytes', 'callable', 'chr', 'classmethod', 'compile', 'comple
'copyright', 'credits', 'delattr', 'dict', 'dir', 'divmod',
'eval', 'exec', 'exit', 'filter', 'float', 'format', 'frozens
'getattr', 'globals', 'hasattr', 'hash', 'help', 'hex', 'id'
'int', 'isinstance', 'issubclass', 'iter', 'len', 'license',
'locals', 'map', 'max', 'memoryview', 'min', 'next', 'object
'open', 'ord', 'pow', 'print', 'property', 'quit', 'range',
'reversed', 'round', 'set', 'setattr', 'slice', 'sorted',
'staticmethod', 'str', 'sum', 'super', 'tuple', 'type', 'vars
…yeah, there’s a lot. But don’t worry, we’ll break these down into
various groups, and knock them down one by one.
So let’s tackle the biggest group by far:
Exceptions
Python has 66 built-in exception classes (so far), each one intended
to be used by the user, the standard library and everyone else, to
serve as meaningful ways to interpret and catch errors in your
code.
To explain exactly why there’s separate Exception classes in
Python, here’s a quick example:
def fetch_from_cache(key):
"""Returns a key's value from cached items."""
if key is None:
raise ValueError('key must not be None')
return cached_items[key]
def get_value(key):
try:
value = fetch_from_cache(key)
except KeyError:
value = fetch_from_api(key)
return value
Focus on the get_value function. It’s supposed to return a cached
value if it exists, otherwise fetch data from an API.
There’s 3 things that can happen in that function:
• If the key is not in the cache, trying to access
cached_items[key] raises a KeyError . This is caught in the
try block, and an API call is made to get the data.
• If they key is present in the cache, it is returned as is.
• There’s also a third case, where key is None .
If the key is None , fetch_from_cache raises a ValueError ,
indicating that the value provided to this function was
inappropriate. And since the try block only catches KeyError ,
this error is shown directly to the user.
>>> x = None
>>> get_value(x)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 3, in get_value
File "<stdin>", line 3, in fetch_from_cache
ValueError: key must not be None
>>>
If ValueError and KeyError weren’t predefined, meaningful
error types, there wouldn’t be any way to differentiate between
error types in this way.
Extras: Exception trivia
Now I should point out that not all uppercase values in that output
above were exception types, there’s in-fact, 1 another type of built-
in objects in Python that are uppercase: constants. So let’s talk
about those.
Constants
There’s exactly 5 constants: True , False , None , Ellipsis , and
NotImplemented .
True , False and None are the most obvious constants.
Ellipsis is an interesting one, and it’s actually represented in two
forms: the word Ellipsis , and the symbol ... . It mostly exists
to support type annotations, and for some very fancy slicing
support.
NotImplemented is the most interesting of them all (other than
the fact that True and False actually function as 1 and 0 if you
didn’t know that, but I digress). NotImplemented is used inside a
class’ operator definitions, when you want to tell Python that a
certain operator isn’t defined for this class.
Now I should mention that all objects in Python can add support
for all Python operators, such as + , - , += , etc., by defining special
methods inside their class, such as __add__ for + , __iadd__ for
+= , and so on.
Let’s see a quick example of that:
class MyNumber:
def __add__(self, other):
return other + 42
This results in our object acting as the value 42 during addition:
>>> num = MyNumber()
>>> num + 3
45
>>> num + 100
142
Extras: right-operators
But let’s say you only want to support integer addition with this
class, and not floats. This is where you’d use NotImplemented :
class MyNumber:
def __add__(self, other):
if isinstance(other, float):
return NotImplemented
return other + 42
Returning NotImplemented from an operator method tells Python
that this is an unsupported operation. Python then conveniently
wraps this into a TypeError with a meaningful message:
>>> n + 0.12
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'MyNumber' and
>>> n + 10
52
A weird fact about constants is that they aren’t even implemented
in Python, they’re implemented directly in C code, like this for
example.
Funky globals
There’s another group of odd-looking values in the builtins output
we saw above: values like __spec__ , __loader__ , __debug__
etc.
These are actually not unique to the builtins module. These
properties are all present in the global scope of every module in
Python, as they are module attributes. These hold information
about the module that is required for the import machinery. Let’s
take a look at them:
__name__
Contains the name of the module. For example,
builtins.__name__ will be the string 'builtins' . When you run
a Python file, that is also run as a module, and the module name
for that is __main__ . This should explain how if __name__ ==
'__main__' works when used in Python files.
__doc__
Contains the module’s docstring. It’s what’s shown as the module
description when you do help(module_name) .
>>> import time
>>> print(time.__doc__)
This module provides various functions to manipulate time valu
There are two standard representations of time. One is the nu
>>> help(time)
Help on built-in module time:
NAME
time - This module provides various functions to manipulat
DESCRIPTION
There are two standard representations of time. One is th
More Python trivia: this is why the PEP8 style guide says
“docstrings should have a line length of 72 characters”: because
docstrings can be indented upto two levels in the help()
message, so to neatly fit on an 80-character wide terminal they
must be at a maximum, 72 characters wide.
__package__
The package to which this module belongs. For top-level modules it
is the same as __name__ . For sub-modules it is the package’s
__name__ . For example:
>>> import [Link]
>>> urllib.__package__
'urllib'
>>> [Link].__name__
'[Link]'
>>> [Link].__package__
'urllib'
__spec__
This refers to the module spec. It contains metadata such as the
module name, what kind of module it is, as well as how it was
created and loaded.
$ tree mytest
mytest
└── a
└── [Link]
1 directory, 1 file
$ python -q
>>> import mytest.a.b
>>> mytest.__spec__
ModuleSpec(name='mytest', loader=<_frozen_importlib_external._
>>> mytest.a.b.__spec__
ModuleSpec(name='mytest.a.b', loader=<_frozen_importlib_extern
You can see through it that, mytest was located using something
called NamespaceLoader from the directory /tmp/mytest , and
mytest.a.b was loaded using a SourceFileLoader , from the
source file [Link] .
__loader__
Let’s see what this is, directly in the REPL:
>>> __loader__
<class '_frozen_importlib.BuiltinImporter'>
The __loader__ is set to the loader object that the import
machinery used when loading the module. This specific one is
defined within the _frozen_importlib module, and is what’s
used to import the builtin modules.
Looking slightly more closely at the example before this, you might
notice that the loader attributes of the module spec are Loader
classes that come from the slightly different
_frozen_importlib_external module.
So you might ask, what are these weird _frozen modules? Well,
my friend, it’s exactly as they say — they’re frozen modules.
The actual source code of these two modules is actually inside the
[Link] module. These _frozen aliases are frozen
versions of the source code of these loaders. To create a frozen
module, the Python code is compiled to a code object, marshalled
into a file, and then added to the Python executable.
If you have no idea what that meant, don’t worry, we will cover
this in detail later.
Python freezes these two modules because they implement the
core of the import system and, thus, cannot be imported like other
Python files when the interpreter boots up. Essentially, they are
needed to exist to bootstrap the import system.
Funnily enough, there’s another well-defined frozen module in
Python: it’s __hello__ :
>>> import __hello__
Hello world!
Is this the shortest hello world code in any language? :P
Well this __hello__ module was originally added to Python as a
test for frozen modules, to see whether or not they work properly.
It has stayed in the language as an easter egg ever since.
__import__
__import__ is the builtin function that defines how import
statements work in Python.
>>> import random
>>> random
<module 'random' from '/usr/lib/python3.9/[Link]'>
>>> __import__('random')
<module 'random' from '/usr/lib/python3.9/[Link]'>
>>> np = __import__('numpy') # Same as doing 'import numpy as
>>> np
<module 'numpy' from '/home/tushar/.local/lib/python3.9/site-p
Essentially, every import statement can be translated into an
__import__ function call. Internally, that’s pretty much what
Python is doing to the import statements (but directly in C).
Now, there’s three more of these properties left: __debug__ and
__build_class__ which are only present globally and are not
module variables, and __cached__ , which is only present in
imported modules.
__debug__
This is a global, constant value in Python, which is almost always
set to True .
What it refers to, is Python running in debug mode. And Python
always runs in debug mode by default.
The other mode that Python can run in, is “optimized mode”. To
run python in “optimized mode”, you can invoke it by passing the -
O flag. And all it does, is prevents assert statements from doing
anything (at least so far), which in all honesty, isn’t really useful at
all.
$ python
>>> __debug__
True
>>> assert False, 'some error'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AssertionError: some error
>>>
$ python -O
>>> __debug__
False
>>> assert False, 'some error'
>>> # Didn't raise any error.
Also, __debug__ , True , False and None are the only true
constants in Python, i.e. these 4 are the only global variables in
Python that you cannot overwrite with a new value.
>>> True = 42
File "<stdin>", line 1
True = 42
^
SyntaxError: cannot assign to True
>>> __debug__ = False
File "<stdin>", line 1
SyntaxError: cannot assign to __debug__
__build_class__
This global was added in Python 3.1, to allow for class definitions to
accept arbitrary positional and keyword arguments. There are long,
technical reasons to why this is a feature, and it touches advanced
topics like metaclasses, so unfortunately I won’t be explaining why
it exists.
But all you need to know is that this is what allows you to do things
like this while making a class:
>>> class C:
... def __init_subclass__(self, **kwargs):
... print(f'Subclass got data: {kwargs}')
...
>>> class D(C, num=42, data='xyz'):
... pass
...
Subclass got data: {'num': 42, 'data': 'xyz'}
>>>
Before Python 3.1, The class creation syntax only allowed passing
base classes to inherit from, and a metaclass property. The new
requirements were to allow variable number of positional and
keyword arguments. This would be a bit messy and complex to add
to the language.
But, we already have this, of course, in the code for calling regular
functions. So it was proposed that the Class X(...) syntax will
simply delegate to a function call underneath:
__build_class__('X', ...) .
__cached__
This is an interesting one.
When you import a module, the __cached__ property stores the
path of the cached file of the compiled Python bytecode of that
module.
“What?!”, you might be saying, “Python? Compiled?”
Yeah. Python is compiled. In fact, all Python code is compiled, but
not to machine code — to bytecode. Let me explain this point by
explaining how Python runs your code.
Here are the steps that the Python interpreter takes to run your
code:
• It takes your source file, and parses it into a syntax tree. The
syntax tree is a representation of your code that can be more
easily understood by a program. It finds and reports any errors
in the code’s syntax, and ensures that there are no ambiguities.
• The next step is to compile this syntax tree into bytecode.
Bytecode is a set of micro-instructions for Python’s virtual
machine. This “virtual machine” is where Python’s interpreter
logic resides. It essentially emulates a very simple stack-based
computer on your machine, in order to execute the Python
code written by you.
• This bytecode-form of your code is then run on the Python VM.
The bytecode instructions are simple things like pushing and
popping data off the current stack. Each of these instructions,
when run one after the other, executes the entire program.
We will take a really detailed example of the steps above, in the
next section. Hang tight!
Now since the “compiling to bytecode” step above takes a
noticeable amount of time when you import a module, Python
stores (marshalls) the bytecode into a .pyc file, and stores it in a
folder called __pycache__ . The __cached__ parameter of the
imported module then points to this .pyc file.
When the same module is imported again at a later time, Python
checks if a .pyc version of the module exists, and then directly
imports the already-compiled version instead, saving a bunch of
time and computation.
If you’re wondering: yes, you can directly run or import a .pyc file
in Python code, just like a .py file:
>>> import test
>>> test.__cached__
'/usr/lib/python3.9/test/__pycache__/__init__.[Link]'
>>> exit()
$ cp '/usr/lib/python3.9/test/__pycache__/__init__.cpython-39
$ python
>>> import cached_test # Runs!
>>>
All the builtins, one by one
Now we can finally get on with builtins. And, to build upon the last
section, let’s start this off with some of the most interesting ones,
the ones that build the basis of Python as a language.
compile , exec and eval : How the code works
In the previous section, we saw the 3 steps required to run some
python code. This section will get into details about the 3 steps, and
how you can observe exactly what Python is doing.
Let’s take this code as an example:
x = [1, 2]
print(x)
You can save this code into a file and run it, or type it in the Python
REPL. In both the cases, you’ll get an output of [1, 2] .
Or thirdly, you can give the program as a string to Python’s builtin
function exec :
>>> code = '''
... x = [1, 2]
... print(x)
... '''
>>> exec(code)
[1, 2]
exec (short for execute) takes in some Python code as a string,
and runs it as Python code. By default, exec will run in the same
scope as the rest of your code, which means, that it can read and
manipulate variables just like any other piece of code in your
Python file.
>>> x = 5
>>> exec('print(x)')
5
exec allows you to run truly dynamic code at runtime. You could,
for example, download a Python file from the internet at runtime,
pass its content to exec and it will run it for you. (But please,
never, ever do that.)
For the most part, you don’t really need exec while writing your
code. It’s useful for implementing some really dynamic behaviour
(such as creating a dynamic class at runtime, like
[Link] does), or to modify the code being read
from a Python file (like in zxpy).
But, that’s not the main topic of discussion today. We must learn
how exec does all of these fancy runtime things.
exec can not only take in a string and run it as code, it can also
take in a code object.
Code objects are the “bytecode” version of Python programs, as
discussed before. They contain not only the exact instructions
generated from your Python code, but it also stores things like the
variables and the constants used inside that piece of code.
Code objects are generated from ASTs (abstract syntax trees),
which are themselves generated by a parser that runs on a string
of code.
Now, if you’re still here after all that nonsense, let’s try to learn this
by example instead. We’ll first generate an AST from our code using
the ast module:
>>> import ast
>>> code = '''
... x = [1, 2]
... print(x)
... '''
>>> tree = [Link](code)
>>> print([Link](tree, indent=2))
Module(
body=[
Assign(
targets=[
Name(id='x', ctx=Store())],
value=List(
elts=[
Constant(value=1),
Constant(value=2)],
ctx=Load())),
Expr(
value=Call(
func=Name(id='print', ctx=Load()),
args=[
Name(id='x', ctx=Load())],
keywords=[]))],
type_ignores=[])
It might seem a bit too much at first, but let me break it down.
The AST is taken as a python module (the same as a Python file in
this case).
>>> print([Link](tree, indent=2))
Module(
body=[
...
The module’s body has two children (two statements):
• The first is an Assign statement…
Assign(
...
Which assigns to the target x …
targets=[
Name(id='x', ctx=Store())],
...
The value of a list with 2 constants 1 and 2 .
value=List(
elts=[
Constant(value=1),
Constant(value=2)],
ctx=Load())),
),
• The second is an Expr statement, which in this case is a
function call…
Expr(
value=Call(
...
Of the name print , with the value x .
func=Name(id='print', ctx=Load()),
args=[
Name(id='x', ctx=Load())],
So the Assign part is describing x = [1, 2] and the Expr is
describing print(x) . Doesn’t seem that bad now, right?
Extras: the Tokenizer
So now we have an AST object. We can compile it into a code object
using the compile builtin. Running exec on the code object will
then run it just as before:
>>> import ast
>>> code = '''
... x = [1, 2]
... print(x)
... '''
>>> tree = [Link](code)
>>> code_obj = compile(tree, '[Link]', 'exec')
>>> exec(code_obj)
[1, 2]
But now, we can look into what a code object looks like. Let’s
examine some of its properties:
>>> code_obj.co_code
b'd\x00d\x01g\x02Z\x00e\x01e\x00\x83\x01\x01\x00d\x02S\x00'
>>> code_obj.co_filename
'[Link]'
>>> code_obj.co_names
('x', 'print')
>>> code_obj.co_consts
(1, 2, None)
You can see that the variables x and print used in the code, as
well as the constants 1 and 2 , plus a lot more information about
our code file is available inside the code object. This has all the
information needed to directly run in the Python virtual machine, in
order to produce that output.
If you want to dive deep into what the bytecode means, the extras
section below on the dis module will cover that.
Extras: the "dis" module
eval is pretty similar to exec , except it only accepts expressions
(not statements or a set of statements like exec ), and unlike
exec , it returns a value — the result of said expression.
Here’s an example:
>>> result = eval('1 + 1')
>>> result
2
You can also go the long, detailed route with eval , you just need
to tell [Link] and compile that you’re expecting to evaluate
this code for its value, instead of running it like a Python file.
>>> expr = [Link]('1 + 1', mode='eval')
>>> code_obj = compile(expr, '<code>', 'eval')
>>> eval(code_obj)
2
globals and locals : Where everything is stored
While the code objects produced store the logic as well as
constants defined within a piece of code, one thing that they don’t
(or even can’t) store, is the actual values of the variables being
used.
There’s a few reasons for this concerning how the language works,
but the most obvious reason can be seen very simply:
def double(number):
return number * 2
The code object of this function will store the constant 2 , as well as
the variable name number , but it obviously cannot contain the
actual value of number , as that isn’t given to it until the function is
actually run.
So where does that come from? The answer is that Python stores
everything inside dictionaries associated with each local scope.
Which means that every piece of code has its own defined “local
scope” which is accessed using locals() inside that code, that
contains the values corresponding to each variable name.
Let’s try to see that in action:
>>> value = 5
>>> def double(number):
... return number * 2
...
>>> double(value)
10
>>> locals()
{'__name__': '__main__', '__doc__': None, '__package__': None
'__loader__': <class '_frozen_importlib.BuiltinImporter'>, '__
'__annotations__': {}, '__builtins__': <module 'builtins' (bui
'value': 5, 'double': <function double at 0x7f971d292af0>}
Take a look at the last line: not only is value stored inside the
locals dictionary, the function double itself is stored there as well!
So that’s how Python stores its data.
globals is pretty similar, except that globals always points to
the module scope (also known as global scope). So with something
like this code:
magic_number = 42
def function():
x = 10
y = 20
print(locals())
print(globals())
locals would just contain x and y , while globals would
contain magic_number and function itself.
input and print : The bread and butter
input and print are probably the first two functionalities that
you learn about Python. And they seem pretty straightforward,
don’t they? input takes in a line of text, and print prints it out,
simple as that. Right?
Well, input and print have a bit more functionality than what
you might know about.
Here’s the full method signature of print :
print(*values, sep=' ', end='\n', file=[Link], flush=False
The *values simply means that you can provide any number of
positional arguments to print , and it will properly print them out,
separated with spaces by default.
If you want the separator to be different, for eg. if you want each
item to be printed on a different line, you can set the sep keyword
accordingly, like '\n' :
>>> print(1, 2, 3, 4)
1 2 3 4
>>> print(1, 2, 3, 4, sep='\n')
1
2
3
4
>>> print(1, 2, 3, 4, sep='\n\n')
1
3
4
>>>
There’s also an end parameter, if you want a different character
for line ends, like, if you don’t want a new line to be printed at the
end of each print, you can use end='' :
>>> for i in range(10):
... print(i)
0
1
2
3
4
5
6
7
8
9
>>> for i in range(10):
... print(i, end='')
0123456789
Now there’s two more parameters to print : file and flush .
file refers to the “file” that you are printing to. By default it points
to [Link] , which is a special “file” wrapper, that prints to the
console. But if you want print to write to a file instead, all you
have to do is change the file parameter. Something like:
t ope ( y e.t t , ) as :
print('Hello!', file=f)
Extras: using a context manager to make a print-writer
flush is a boolean flag to the print function. All it does is tell
print to write the text immediately to the console/file instead of
putting it in a buffer. This usually doesn’t make much of a
difference, but if you’re printing a very large string to a console, you
might want to set it to True to avoid lag in showing the output to
the user.
Now I’m sure many of you are interested in what secrets the
input function hides, but there’s none. input simply takes in a
string to show as the prompt. Yeah, bummer, I know.
str , bytes , int , bool , float and complex : The five
primitives
Python has exactly 6 primitive data types (well, actually just 5, but
we’ll get to that). 4 of these are numerical in nature, and the other 2
are text-based. Let’s talk about the text-based first, because that’s
going to be much simpler.
str is one of the most familiar data types in Python. Taking user
input using the input method gives you a string, and every other
data type in Python can be converted into a string. This is necessary
because all computer Input/Output is in text-form, be it user I/O or
file I/O, which is probably why strings are everywhere.
bytes on the other hand, are actually the basis of all I/O in
computing. If you know about computers, you would probably
know that all data is stored and handled as bits and bytes — and
that’s how terminals really work as well.
If you want to take a peek at the bytes underneath the input and
print calls: you need to take a look at the I/O buffers in the sys
module: [Link] and [Link] :
>>> import sys
>>> print('Hello!')
Hello!
>>> 'Hello!\n'.encode() # Produces bytes
b'Hello!\n'
>>> char_count = [Link]('Hello!\n'.encode())
Hello!
>>> char_count # write() returns the number of bytes written
7
The buffer objects take in bytes , write those directly to the output
buffer, and return the number of bytes returned.
To prove that everything is just bytes underneath, let’s look at
another example that prints an emoji using its bytes:
>>> import sys
>>> ' 🐍 '.encode()
b'\xf0\x9f\x90\x8d' # utf-8 encoded string of the snake emoj
>>> _ = [Link](b'\xf0\x9f\x90\x8d')
🐍
int is another widely-used, fundamental primitive data type. It’s
also the lowest common denominator of 2 other data types: ,
float and complex . complex is a supertype of float , which, in
turn, is a supertype of int .
What this means is that all int s are valid as a float as well as a
complex , but not the other way around. Similarly, all float s are
also valid as a complex .
If you don’t know, complex is the implementation for “complex
numbers” in Python. They’re a really common tool in
mathematics.
Let’s take a look at them:
>>> x = 5
>>> y = 5.0
>>> z = 5.0+0.0j
>>> type(x), type(y), type(z)
(<class 'int'>, <class 'float'>, <class 'complex'>)
>>> x == y == z # All the same value
True
>>> y
5.0
>>> float(x) # float(x) produces the same result as y
5.0
>>> z
(5+0j)
>>> complex(x) # complex(x) produces the same result as z
(5+0j)
Now, I mentioned for a moment that there’s actually only 5
primitive data types in Python, not 6. That is because, bool is
actually not a primitive data type — it’s actually a subclass of int !
You can check it yourself, by looking into the mro property of these
classes.
mro stands for “method resolution order”. It defines the order in
which the methods called on a class are looked for. Essentially, the
method calls are first looked for in the class itself, and if it’s not
present there, it’s searched in its parent class, and then its parent,
all the way to the top: object . Everything in Python inherits from
object . Yes, pretty much everything in Python is an object.
Take a look:
>>> [Link]()
[<class 'int'>, <class 'object'>]
>>> [Link]()
[<class 'float'>, <class 'object'>]
>>> [Link]()
[<class 'complex'>, <class 'object'>]
>>> [Link]()
[<class 'str'>, <class 'object'>]
>>> [Link]()
[<class 'bool'>, <class 'int'>, <class 'object'>] # Look!
You can see from their “ancestry”, that all the other data types are
not “sub-classes” of anything (except for object , which will always
be there). Except bool , which inherits from int .
Now at this point, you might be wondering “WHY? Why does bool
subclass int ?” And the answer is a bit anti-climatic. It’s mostly
because of compatibility reasons. Historically, logical true/false
operations in Python simply used 0 for false and 1 for true. In
Python version 2.2, the boolean values True and False were
added to Python, and they were simply wrappers around these
integer values. The fact has stayed the same till date. That’s all.
But, it also means that, for better or for worse, you can pass a
bool wherever an int is expected:
>>> import json
>>> data = {'a': 1, 'b': {'c': 2}}
>>> print([Link](data))
{"a": 1, "b": {"c": 2}}
>>> print([Link](data, indent=4))
{
"a": 1,
"b": {
"c": 2
}
}
>>> print([Link](data, indent=True))
{
"a": 1,
"b": {
"c": 2
}
}
indent=True here is treated as indent=1 , so it works, but I’m
pretty sure nobody would intend that to mean an indent of 1
space. Welp.
object : The base
object is the base class of the entire class hierarchy. Everyone
inherits from object .
The object class defines some of the most fundamental
properties of objects in Python. Functionalities like being able to
hash an object through hash() , being able to set attributes and
get their value, being able to convert an object into a string
representation, and many more.
It does all of this through its pre-defined “magic methods”:
>>> dir(object)
['__class__', '__delattr__', '__dir__', '__doc__', '__eq__',
'__getattribute__', '__gt__', '__hash__', '__init__', '__init_
'__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
'__setattr__', '__sizeof__', '__str__', '__subclasshook__']
Accessing an attribute with obj.x calls the __getattr__ method
underneath. Similarly setting a new attribute and deleting an
attribute calls __setattr__ and __delattr__ respectively. The
object’s hash is generated by the pre-defined __hash__ method,
and the string representation of objects comes from __repr__ .
>>> object() # This creates an object with no properties
<object object at 0x7f47aecaf210> # defined in __repr__()
>>> class dummy(object):
... pass
>>> x = dummy()
>>> x
<__main__.dummy object at 0x7f47aec510a0> # functionality inh
>>> hash(object())
8746615746334
>>> hash(x)
8746615722250
>>> x.__hash__() # is the same as hash(x)
8746615722250
There’s actually a lot more to speak about magic methods in
Python, as they form the backbone of the object-oriented, duck-
typed nature of Python. But, that’s a story for another blog.
Stay tuned if you’re interested 😉
type : The class factory
If object is the father of all objects, type is father of all “classes”.
As in, while all objects inherit from object , all classes inherit from
type .
type is the builtin that can be used to dynamically create new
classes. Well, it actually has two uses:
• If given a single parameter, it returns the “type” of that
parameter, i.e. the class used to make that object:
>>> x 5
>>> type(x)
<class 'int'>
>>> type(x) is int
True
>>> type(x)(42.0) # Same as int(42.0)
42
• If given three parameters, it creates a new class. The three
parameters are name , bases , and dict .
• name defines the name of the class
• bases defines the base classes, i.e. superclasses
• dict defines all class attributes and methods.
So this class definition:
class MyClass(MySuperClass):
def x(self):
print('x')
Is identical to this class definition:
def x_function(self):
print('x')
MyClass = type('MyClass', (MySuperClass), {'x': x_function
This can be one way to implement the
[Link] class, for example, which takes in a
class name and a tuple of attributes.
hash and id : The equality fundamentals
The builtin functions hash and id make up the backbone of
object equality in Python.
Python objects by default aren’t comparable, unless they are
identical. If you try to create two object() items and check if
they’re equal…
>>> x = object()
>>> y = object()
>>> x == x
True
>>> y == y
True
>>> x == y # Comparing two objects
False
The result will always be False . This comes from the fact that
object s compare themselves by identity: They are only equal to
themselves, nothing else.
Extras: Sentinels
To understand why objects only compare to themselves, we will
have to understand the is keyword.
Python’s is operator is used to check if two values reference the
same exact object in memory. Think of Python objects like boxes
floating around in space, and variables, array indexes, and so on
being named arrows pointing to these objects.
Let’s take a quick example:
>>> x = object()
>>> y = object()
>>> z = y
>>> x is y
False
>>> y is z
True
In the code above, there are two separate objects, and three labels
x , y and z pointing to these two objects: x pointing to the first
one, and y and z both pointing to the other one.
>>> del x
This deletes the arrow x . The objects themselves aren’t affected by
assignment, or deletion, only the arrows are. But now that there
are no arrows pointing to the first object, it is meaningless to keep
it alive. So Python’s “garbage collector” gets rid of it. Now we are left
with a single object .
>>> y = 5
Now y arrow has been changed to point to an integer object 5
instead. z still points to the second object though, so it’s still
alive.
>>> z = y * 2
Now z points to yet another new object 10 , which is stored
somewhere in memory. Now the second object also has nothing
pointing to it, so that is subsequently garbage collected.
To be able to verify all of this, we can use the id builtin function.
id spells out the exact location of the object in memory,
represented as a number.
>>> x = object()
>>> y = object()
>>> z = y
>>> id(x)
139737240793600
>>> id(y)
139737240793616
>>> id(z)
139737240793616 # Notice the numbers!
>>> x is y
False
>>> id(x) == id(y)
False
>>> y is z
True
>>> id(y) == id(z)
True
Same object, same id . Different objects, different id . Simple as
that.
With object s, == and is behaves the same way:
>>> x = object()
>>> y = object()
>>> z = y
>>> x is y
False
>>> x == y
False
>>> y is z
True
>>> y == z
True
This is because object ’s behaviour for == is defined to compare
the id . Something like this:
class object:
def __eq__(self, other):
return self is other
The actual implementation of object is written in C.
Unlike == , there’s no way to override the behavior of the is
operator.
Container types, on the other hand, are equal if they can be
replaced with each other. Good examples would be lists with have
the same items at the same indices, or sets containing the exact
same values.
>>> x = [1, 2, 3]
>>> y = [1, 2, 3]
>>> x is y
False # Different objects,
>>> x == y
True # Yet, equal.
These can be defined in this way:
class list:
def __eq__(self, other):
if len(self) != len(other):
return False
return all(x == y for x, y in zip(self, other))
# Can also be written as:
return all(self[i] == other[i] for i in range(len(self
We haven’t looked at all or zip yet, but all this does is make
sure all of the given list indices are equal.
Similarly, sets are unordered so even their location doesn’t matter,
only their “presence”:
class list:
def __eq__(self, other):
if len(self) != len(other):
return False
return all(item in other for item in self)
Now, related to the idea of “equivalence”, Python has the idea of
hashes. A “hash” of any piece of data refers to a pre-computed
value that looks pretty much random, but it can be used to identify
that piece of data (to some extent).
Hashes have two specific properties:
• The same piece of data will always have the same hash value.
• Changing the data even very slightly, returns in a drastically
different hash.
What this means is that if two values have the same hash, it’s very
*likely* that they have the same value as well.
Comparing hashes is a really fast way to check for “presence”. This
is what dictionaries and sets use to find values inside them pretty
much instantly:
>>> import timeit
>>> [Link]('999 in l', setup='l = list(range(1000))')
12.224023487000522 # 12 seconds to run a million times
>>> [Link]('999 in s', setup='s = set(range(1000))')
0.06099735599855194 # 0.06 seconds for the same thing
Notice that the set solution is running hunderds of times faster
than the list solution! This is because they use the hash values as
their replacement for “indices”, and if a value at the same hash is
already stored in the set/dictionary, Python can quickly check if it’s
the same item or not. This process makes checking for presence
pretty much instant.
Extras: hash factoids
dir and vars : Everything is a dictionary
Have you ever wondered how Python stores objects, their
variables, their methods and such? We know that all objects have
their own properties and methods attached to them, but how
exactly does Python keep track of them?
The simple answer is that everything is stored inside dictionaries.
And the vars method exposes the variables stored inside objects
and classes.
>>> class C:
... some_constant = 42
... def __init__(self, x, y):
... self.x = x
... self.y = y
... def some_method(self):
... pass
...
>>> c = C(x=3, y=5)
>>> vars(c)
{'x': 3, 'y': 5}
>>> vars(C)
mappingproxy(
{'__module__': '__main__', 'some_constant': 42,
'__init__': <function C.__init__ at 0x7fd27fc66d30>,
'some_method': <function C.some_method at 0x7fd27f350ca0>,
'__dict__': <attribute '__dict__' of 'C' objects>,
'__weakref__': <attribute '__weakref__' of 'C' objects>,
'__doc__': None
})
As you can see, the attributes x and y related to the object c are
stored in its own dictionary, and the methods ( some_function
and __init__ ) are actually stored as functions in the class’s
dictionary. Which makes sense, as the code of the function itself
doesn’t change for every object, only the variables passed to it
change.
This can be demonstrated with the fact that [Link](x) is the
same as [Link](c, x) :
>>> class C:
... def function(self, x):
... print(f'self={self}, x={x}')
>>> c = C()
>>> [Link](c, 5)
self=<__main__.C object at 0x7f90762461f0>, x=5
>>> [Link](5)
self=<__main__.C object at 0x7f90762461f0>, x=5
It shows that a function defined inside a class really is just a
function, with self being just an object that is passed as the first
argument. The object syntax [Link](x) is just a cleaner way to
write [Link](c, x) .
Now here’s a slightly different question. If vars shows all methods
inside a class, then why does this work?
>>> class C:
... def function(self, x): pass
...
>>> vars(C)
mappingproxy({
'__module__': '__main__',
'function': <function [Link] at 0x7f607ddedb80>,
'__dict__': <attribute '__dict__' of 'C' objects>,
'__weakref__': <attribute '__weakref__' of 'C' objects>,
'__doc__': None
})
>>> c = C()
>>> vars(c)
{}
>>> c.__class__
<class '__main__.C'>
🤔 __class__ is defined in neither c ’s dict, nor in C … then
where is it coming from?
If you want a definitive answer of which properties can be accessed
on an object, you can use dir :
>>> dir(c)
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__'
'__format__', '__ge__', '__getattribute__', '__gt__', '__hash_
'__init_subclass__', '__le__', '__lt__', '__module__', '__ne__
'__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__s
'__subclasshook__', '__weakref__', 'function']
So where are the rest of the properties coming from? Well, the
story is slightly more complicated, for one simple reason: Python
supports inheritance.
All objects in python inherit by default from the object class, and
indeed, __class__ is defined on object :
>>> '__class__' in vars(object)
True
>>> vars(object).keys()
dict_keys(['__repr__', '__hash__', '__str__', '__getattribute_
'__delattr__', '__lt__', '__le__', '__eq__', '__ne__', '__gt__
'__init__', '__new__', '__reduce_ex__', '__reduce__', '__subcl
'__init_subclass__', '__format__', '__sizeof__', '__dir__', '_
And that does cover everything that we see in the output of
dir(c) .
Now that I’ve mentioned inheritence, I think I should also elaborate
how the “method resolution order” works. MRO for short, this is the
list of classes that an object inherits properties and methods from.
Here’s a quick example:
>>> class A:
... def __init__(self):
... self.x = 'x'
... self.y = 'y'
...
>>> class B(A):
... def __init__(self):
... self.z = 'z'
...
>>> a = A()
>>> b = B()
>>> [Link]()
[<class '__main__.B'>, <class '__main__.A'>, <class 'object'>]
>>> dir(b)
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__'
'__format__', '__ge__', '__getattribute__', '__gt__', '__hash_
'__init_subclass__', '__le__', '__lt__', '__module__', '__ne__
'__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__s
'__subclasshook__', '__weakref__', 'x', 'y', 'z']
>>> set(dir(b)) - set(dir(a)) # all values in dir(b) that are
{'z'}
>>> vars(b).keys()
dict_keys(['z'])
>>> set(dir(a)) - set(dir(object))
{'x', 'y'}
>>> vars(a).keys()
dict_keys(['x', 'y'])
So every level of inheritence adds the newer methods into the dir
list, and dir on a subclass shows all methods found in its method
resolution order. And that’s how Python suggests method
completion in the REPL:
>>> class A:
... x = 'x'
...
>>> class B(A):
... y = 'y'
...
>>> b = B()
>>> b. # Press <tab> twice here
b.x b.y # autocompletion!
Extras: slots?
hasattr , getattr , setattr and delattr : Attribute
helpers
Now that we’ve seen that objects are pretty much the same as
dictionaries underneath, let’s draw a few more paralells between
them while we are at it.
We know that accessing as well as reassigning a property inside a
dictionary is done using indexing:
>>> dictionary = {'property': 42}
>>> dictionary['property']
42
while on an object it is done via the . operator:
>>> class C:
... prop = 42
...
>>> [Link]
42
You can even set and delete properties on objects:
>>> [Link] = 84
>>> [Link]
84
>>> del [Link]
AttributeError: type object 'C' has no attribute 'prop'
But dictionaries are so much more flexible: you can for example,
check if a property exists in a dictionary:
>>> d = {}
>>> 'prop' in d
False
>>> d['prop'] = 'exists'
>>> 'prop' in d
True
You could do this in an object by using try-catch:
>>> class X:
... pass
...
>>> x = X()
>>> try:
... print([Link])
>>> except AttributeError:
... print("prop doesn't exist.")
prop doesn't exist.
But the preferred method to do this would be direct equivalent:
hasattr .
>>> class X:
... pass
...
>>> x = X()
>>> hasattr(x, 'prop')
False
>>> [Link] = 'exists'
>>> hasattr(x, 'prop')
True
Another thing that dictionaries can do is using a variable to index a
dict. You can’t really do that with objects, right? Let’s try:
>>> class X:
... value = 42
...
>>> x = X()
>>> attr_name = 'value'
>>> x.attr_name
AttributeError: 'X' object has no attribute 'attr_name'
Yeah, it doesn’t take the variable’s value. This should be pretty
obvious. But to actually do this, you can use getattr , which does
take in a string, just like a dictionary key:
>>> class X:
... value = 42
...
>>> x = X()
>>> getattr(x, 'value')
42
>>> attr_name = 'value'
>>> getattr(x, attr_name)
42 # It works!
setattr and delattr work the same way: they take in the
attribute name as a string, and sets/deletes the corresponding
attribute accordingly.
>>> class X:
... value = 42
...
>>> x = X()
>>> setattr(x, 'value', 84)
>>> [Link]
84
>>> delattr(x, 'value') # deletes the attribute completety
>>> hasattr(x, 'value')
False # `value` no longer exists on the object.
Let’s try to build something that kinda makes sense with one of
these functions:
Sometimes you need to create a function that has to be overloaded
to either take a value directly, or take a “factory” object, it can be an
object or a function for example, which generates the required
value on demand. Let’s try to implement that pattern:
class api:
"""A dummy API."""
def send(item):
print(f'Uploaded {item!r}!')
def upload_data(item):
"""Uploads the provided value to our database."""
if hasattr(item, 'get_value'):
data = item.get_value()
[Link](data)
else:
[Link](item)
The upload_data function is checking if we have gotten a factory
object, by checking if it has a get_value method. If it does, that
function is used to get the actual value to upload. Let’s try to use it!
>>> import json
>>> class DataCollector:
... def __init__(self):
... [Link] = []
... def add_item(self, item):
... [Link](item)
... def get_value(self):
... return [Link]([Link])
...
>>> upload_data('some text')
Uploaded 'some text'!
>>> collector = DataCollector()
>>> collector.add_item(42)
>>> collector.add_item(1000)
>>> upload_data(collector)
Uploaded '[42, 1000]'!
super : The power of inheritance
super is Python’s way of referencing a superclass, to use its
methods, for example.
Take this example, of a class that encapsulates the logic of
summing two items:
class Sum:
def __init__(self, x, y):
self.x = x
self.y = y
def perform(self):
return self.x + self.y
Using this class is pretty simple:
>>> s = Sum(2, 3)
>>> [Link]()
5
Now let’s say you want to subclass Sum to create a a DoubleSum
class, which has the same perform interface but it returns double
the value instead. You’d use super for that:
class DoubleSum(Sum):
def perform(self):
parent_sum = super().perform()
return 2 * parent_sum
We didn’t need to define anything that was already defined: We
didn’t need to define __init__ , and we didn’t have to re-write the
sum logic as well. We simply piggy-backed on top of the superclass.
>>> d = DoubleSum(3, 5)
>>> [Link]()
16
Now there are some other ways to use the super object, even
outside of a class:
>>> super(int)
<super: <class 'int'>, NULL>
>>> super(int, int)
<super: <class 'int'>, <int object>>
>>> super(int, bool)
<super: <class 'int'>, <bool object>>
But honestly, I don’t understand what these would ever be used
for. If you know, let me know in the comments ✨
property , classmethod and staticmethod : Method
decorators
We’re reaching the end of all the class and object-related builtin
functions, the last of it being these three decorators.
• property :
@property is the decorator to use when you want to define
getters and setters for properties in your object. Getters and
setters provide a way to add validation or run some extra code
when trying to read or modify the attributes of an object.
This is done by turning the property into a set of functions: one
function that is run when you try to access the property, and
another that is run when you try to change its value.
Let’s take a look at an example, where we try to ensure that the
“marks” property of a student is always set to a positive
number, as marks cannot be negative:
class Student:
def __init__(self):
self._marks = 0
@property
def marks(self):
return self._marks
@[Link]
def marks(self, new_value):
# Doing validation
if new_value < 0:
raise ValueError('marks cannot be negative')
# before actually setting the value.
self._marks = new_value
Running this code:
>>> student = Student()
>>> [Link]
0
>>> [Link] = 85
>>> [Link]
85
>>> [Link] = -10
ValueError: marks cannot be negative
• classmethod :
@classmethod can be used on a method to make it a class
method instead: such that it gets a reference to the class object,
instead of the instance ( self ).
A simple example would be to create a function that returns the
name of the class:
>>> class C:
... @classmethod
... def class_name(cls):
... return cls.__name__
...
>>> x = C()
>>> x.class_name
'C'
• staticmethod : @staticmethod is used to convert a method
into a static method: one equivalent to a function sitting inside
a class, independent of any class or object properties. Using this
completely gets rid of the first self argument passed to
methods.
We could make one that does some data validation for
example:
class API:
@staticmethod
def is_valid_title(title_text):
"""Checks whether the string can be used as a blog
return title_text.istitle() and len(title_text) <
These builtins are created using a pretty advanced topic called
descriptors. I’ll be honest, descriptors are a topic that is so
advanced that trying to cover it here won’t be of any use beyond
what has already been told. I’m planning on writing a detailed
article on descriptors and their uses sometime in the future, so
stay tuned for that!
list , tuple , dict , set and frozenset : The
containers
A “container” in Python refers to a data structure that can hold any
number of items inside it.
Python has 5 fundamental container types:
• list : Ordered, indexed container. Every element is present at
a specific index. Lists are mutable, i.e. items can be added or
removed at any time.
>>> my_list = [10, 20, 30] # Creates a list with 3 items
>>> my_list[0] # Indexes start with zero
10
>>> my_list[1] # Indexes increase one by one
20
>>> my_list.append(40) # Mutable: can add values
>>> my_list
[10, 20, 30, 40]
>>> my_list[0] = 50 # Can also reassign indexes
>>> my_list
[50, 20, 30, 40]
• tuple : Ordered and indexed just like lists, but with one key
difference: They are immutable, which means items cannot be
added or deleted once the tuple is created.
>>> some_tuple = (1, 2, 3)
>>> some_tuple[0] # Indexable
1
>>> some_tuple.append(4) # But NOT mutable
AttributeError: ...
>>> some_tuple[0] = 5 # Cannot reassign an index
TypeError: ...
• dict : Unordered key-value pairs. The key is used to access the
value. Only one value can correspond to a given key.
>>> flower_colors = {'roses': 'red', 'violets': 'blue'}
>>> flower_colors['violets'] # Use keys to a
'blue'
>>> flower_colors['violets'] = 'purple' # Mutable
>>> flower_colors
{'roses': 'red', 'violets': 'purple'}
>>> flower_colors['daffodil'] = 'yellow' # Can also add
>>> flower_colors
{'roses': 'red', 'violets': 'purple', 'daffodil': 'yellow'
• set : Unordered, unique collection of data. Items in a set simply
represent their presence or absence. You could use a set to find
for example, the kinds of trees in a forest. Their order doesn’t
matter, only their existance.
>>> forest = ['cedar', 'bamboo', 'cedar', 'cedar', 'cedar'
>>> tree_types = set(forest)
>>> tree_types
{'bamboo', 'oak', 'cedar'} # Only unique items
>>> 'oak' in tree_types
True
>>> tree_types.remove('oak') # Sets are also mutable
>>> tree_types
{'bamboo', 'cedar'}
• A frozenset is identical to a set, but just like tuple s, is
immutable.
>>> forest = ['cedar', 'bamboo', 'cedar', 'cedar', 'cedar'
>>> tree_types = frozenset(forest)
>>> tree_types
frozenset({'bamboo', 'oak', 'cedar'})
>>> 'cedar' in tree_types
True
>>> tree_types.add('mahogany') # CANNOT modify
AttributeError: ...
The builtins list , tuple and dict can be used to create empty
instances of these data structures too:
>>> x = list()
>>> x
[]
>>> y = dict()
>>> y
{}
But the short-form {...} and [...] is more readable and should
be preferred. It’s also a tiny-bit faster to use the short-form syntax,
as list , dict etc. are defined inside builtins, and looking up
these names inside the variable scopes takes some time, whereas
[] is understood as a list without any lookup.
bytearray and memoryview : Better byte interfaces
A bytearray is the mutable equivalent of a bytes object, pretty
similar to how lists are essentially mutable tuples.
bytearray makes a lot of sense, as:
• A lot of low-level interactions have to do with byte and bit
manipulation, like this horrible implementation for [Link] ,
so having a byte array where you can mutate individual bytes is
going to be much more efficient.
• Bytes have a fixed size (which is… 1 byte). On the other hand,
string characters can have various sizes thanks to the unicode
encoding standard, “utf-8”:
>>> x = 'I ♥🐍 '
>>> len(x)
3
>>> [Link]()
b'I\xe2\x99\xa5\xf0\x9f\x90\x8d'
>>> len([Link]())
8
>>> x[2]
'🐍'
>>> x[2].encode()
b'\xf0\x9f\x90\x8d'
>>> len(x[2].encode())
4
So it turns out, that the three-character string ‘I♥🐍’ is actually
eight bytes, with the snake emoji being 4 bytes long. But, in the
encoded version of it, we can access each individual byte. And
because it’s a byte, its “value” will always be between 0 and 255:
>>> x[2]
'🐍'
>>> b = x[2].encode()
>>> b
b'\xf0\x9f\x90\x8d' # 4 bytes
>>> b[:1]
b'\xf0'
>>> b[1:2]
b'\x9f'
>>> b[2:3]
b'\x90'
>>> b[3:4]
b'\x8d'
>>> b[0] # indexing a bytes object gives an integer
240
>>> b[3]
141
So let’s take a look at some byte/bit manipulation examples:
def alternate_case(string):
"""Turns a string into alternating uppercase and lowercase
array = bytearray([Link]())
for index, byte in enumerate(array):
if not ((65 <= byte <= 90) or (97 <= byte <= 126)):
continue
if index % 2 == 0:
array[index] = byte | 32
else:
array[index] = byte & ~32
return [Link]()
>>> alternate_case('Hello WORLD?')
'hElLo wOrLd?'
This is not a good example, and I’m not going to bother explaining
it, but it works, and it is much more efficient than creating a new
bytes object for every character change.
Meanwhile, a memoryview takes this idea a step further: It’s pretty
much just like a bytearray, but it can refer to an object or a slice by
reference, instead of creating a new copy for itself. It allows you to
pass references to sections of bytes in memory around, and edit it
in-place:
>>> array = bytearray(range(256))
>>> array
bytearray(b'\x00\x01\x02\x03\x04\x05\x06\x07\x08...
>>> len(array)
256
>>> array_slice = array[65:91] # Bytes 65 to 90 are uppercase
>>> array_slice
bytearray(b'ABCDEFGHIJKLMNOPQRSTUVWXYZ')
>>> view = memoryview(array)[65:91] # Does the same thing,
>>> view
<memory at 0x7f438cefe040> # but doesn't generate a new new b
>>> bytearray(view)
bytearray(b'ABCDEFGHIJKLMNOPQRSTUVWXYZ') # It can still be co
>>> view[0] # 'A'
65
>>> view[0] += 32 # Turns it lowercase
>>> bytearray(view)
bytearray(b'aBCDEFGHIJKLMNOPQRSTUVWXYZ') # 'A' is now lowerca
>>> bytearray(view[10:15])
bytearray(b'KLMNO')
>>> view[10:15] = bytearray(view[10:15]).lower()
>>> bytearray(view)
bytearray(b'aBCDEFGHIJklmnoPQRSTUVWXYZ') # Modified 'KLMNO' i
bin , hex , oct , ord , chr and ascii : Basic
conversions
The bin , hex and oct triplet is used to convert between bases in
Python. You give them a number, and they will spit out how you
can write that number in that base in your code:
>>> bin(42)
'0b101010'
>>> hex(42)
'0x2a'
>>> oct(42)
'0o52'
>>> 0b101010
42
>>> 0x2a
42
>>> 0o52
42
Yeah, you can write numbers in base 2, base 8 or base 16 in your
code if you really want to. In the end, they are all completely
identical to the integers wriiten in regular decimal:
>>> type(0x20)
<class 'int'>
>>> type(0b101010)
<class 'int'>
>>> 0o100 == 64
True
But there are times where it makes sense to use other bases
instead, like when writing bytes:
>>> bytes([255, 254])
b'\xff\xfe' # Not very easy to comprehend
>>> # This can be written as:
>>> bytes([0xff, 0xfe])
b'\xff\xfe' # An exact one-to-one translation
Or when writing OS-specific codes that are implemented in octal,
for example:
import os
>>> [Link]('[Link]', os.O_RDWR, mode=384) # ??? what's 3
>>> # This can be written as:
>>> [Link]('[Link]', os.O_RDWR, mode=0o600) # mode is 600
Note that bin for example is only supposed to be used when you
want to create a binary-representation of a Python integer: If you
want a binary string it’s better to use Python’s string formatting:
>>> f'{42:b}'
101010
ord and chr are used to convert ascii as well as unicode
characters and their character codes:
>>> ord('x')
120
>>> chr(120)
'x'
>>> ord(' 🐍 ')
128013
>>> hex(ord(' 🐍 '))
'0x1f40d'
>>> chr(0x1f40d)
'🐍'
>>> '\U0001f40d' # The same value, as a unicode escape inside
'🐍'
It’s pretty simple.
format : Easy text transforms
format(string, spec) is just another way to do
[Link](spec) .
Python’s string formatting can do a lot of interesting things, like:
>>> format(42, 'c') # int to ascii
'*'
>>> format(604, 'f') # int to float
'604.000000'
>>> format(357/18, '.2f') # specify decimal precision
'19.83'
>>> format(604, 'x') # int to hex
'25c'
>>> format(604, 'b') # int to binary
'1001011100'
>>> format(604, '0>16b') # binary with zero-padding
'0000001001011100'
>>> format('Python!', ' 🐍 ^15') # centered aligned text
' 🐍🐍🐍🐍 Python! 🐍🐍🐍🐍 '
I have an entire article on string formatting right here, so check that
out for more.
any and all
These two are some of my favorite builtins. Not because they are
incredibly helpful or powerful, but just because how Pythonic they
are. There’s certain pieces of logic that can be re-written using any
or all , which will instantly make it much shorter and much more
readable, which is what Python is all about. Here’s an example of
one such case:
Let’s say you have a bunch of JSON responses from an API, and you
want to make sure that all of them contain an ID field, which is
exactly 20 characters long. You could write your code in this way:
def validate_responses(responses):
for response in responses:
# Make sure that `id` exists
if 'id' not in response:
return False
# Make sure it is a string
if not isinstance(response['id'], str):
return False
# Make sure it is 20 characters
if len(response['id']) != 20:
return False
# If everything was True so far for every
# response, then we can return True.
return True
Or, we can write it in this way:
def validate_responses(responses):
return all(
'id' in response
and isinstance(response['id'], str)
and len(response['id']) == 20
for response in responses
)
What all does is it takes in an iterator of boolean values, and it
returns False if it encounters even a single False value in the
iterator. Otherwise it returns True .
And I love the way to do it using all , because it reads exactly like
english: “Return if id’s exist, are integers and are 20 in length, in all
responses.”
Here’s another example: Trying to see if there’s any palindromes in
the list:
def contains_palindrome(words):
for word in words:
if word == ''.join(reversed(word)):
return True
# Found no palindromes in the end
return False
vs.
def contains_palindrome(words):
return any(word == ''.join(reversed(word)) for word in wor
And with the wording I believe it should be obvious, that any does
the opposite of all: it returns True if even one value is True ,
otherwise it returns False .
Extras: listcomps inside any / all
abs , divmod , pow and round : Math basics
These four math functions are so common in programming that
they have been thrown straight into the builtins where they are
always available, rather than putting them in the math module.
They’re pretty straightforward:
• abs returns the absolute value of a number, eg:
>>> abs(42)
42
>>> abs(-3.14)
3.14
>>> abs(3-4j)
5.0
• divmod returns the quotient and remainder after a divide
operation:
>>> divmod(7, 2)
(3, 1)
>>> quotient, remainder = divmod(5327, 100)
>>> quotient
53
>>> remainder
27
• pow returns the exponent (power) of a value:
>>> pow(100, 3)
1000000
>>> pow(2, 10)
1024
• round returns a number rounded to the given decimal
precision:
>>> import math
>>> [Link]
3.141592653589793
>>> round([Link])
3
>>> round([Link], 4)
3.1416
>>> round(1728, -2)
1700
isinstance and issubclass : Runtime type checking
You’ve already seen the type builtin, and using that knowledge
you can already implement runtime type-checking if you need to,
like this:
def print_stuff(stuff):
if type(stuff) is list:
for item in stuff:
print(item)
else:
print(stuff)
Here, we are trying to check if the item is a list , and if it is, we
print each item inside it individually. Otherwise, we just print the
item. And this is what the code does:
>>> print_stuff('foo')
foo
>>> print_stuff(123)
123
>>> print_stuff(['spam', 'eggs', 'steak'])
spam
eggs
steak
It does work! So yeah, you can check, at runtime, the type of a
variable and change the behaviour of your code. But, there’s
actually quite a few issues with the code above. Here’s one
example:
>>> class MyList(list):
... pass
...
>>> items = MyList(['spam', 'eggs', 'steak'])
>>> items
['spam', 'eggs', 'steak']
>>> print_stuff(items)
['spam', 'eggs', 'steak']
Welp, items is very clearly still a list, but print_stuff doesn’t
recognize it anymore. And the reason is simple, because
type(items) is now MyList , not list .
This code seems to be violating one of the five SOLID principles,
called “Liskov Substitution Principle”. The principle says that
“objects of a superclass shall be replaceable with objects of its
subclasses without breaking the application”. This is important for
inheritance to be a useful programming paradigm.
The underlying issue of our function is that it doesn’t account for
inheritence. And that’s exactly what isinstance is for: It doesn’t
only check if an object is an instance of a class, it also checks if that
object is an instance of a sub-class:
>>> class MyList(list):
... pass
...
>>> items = ['spam', 'eggs', 'steak']
>>> type(items) is list
True
>>> isinstance(items, list)
True # Both of these do the same thing
>>> items = MyList(['spam', 'eggs', 'steak'])
>>> type(items) is list
False # And while `type` doesn't work,
>>> isinstance(items, list)
True # `isinstance` works with subclasses too.
Similarly, issubclass checks if a class is a subclass of another
class. The first argument for isinstance is an object, but for
issubclass it’s another class:
>>> issubclass(MyList, list)
True
Replacing the type check with isinstance , the code above will
follow Liskov Substitution Principle. But, it can still be improved.
Take this for example:
>>> items = ('spam', 'eggs', 'steak')
>>> print_stuff(items)
('spam', 'eggs', 'steak')
Obviously it doesn’t handle other container types other than list
as of now. You could try to work around this by checking for
isinstance of list, tuple, dictionary, and so on. But how far? How
many objects are you going to add support for?
For this case, Python gives you a bunch of “base classes”, that you
can use to test for certain “behaviours” of your class, instead of
testing for the class itself. In our case, the behaviour is being a
container of other objects, so aptly the base class is called
Container :
>>> from [Link] import Container
>>> items = ('spam', 'eggs', 'steak')
>>> isinstance(items, tuple)
True
>>> isinstance(items, list)
False
>>> isinstance(items, Container)
True # This works!
We should’ve used the Iterable or Collection base class
here, but that would behave differently for strings as strings are
iterable, but aren’t a container. That’s why Container was
chosen here. This is only for ease of explanation, and in real-world
code it is recommended to see exactly which base class is
appropriate for your use case. You can find that using the docs.
Every container object type will return True in the check against
the Container base class. issubclass works too:
>>> from [Link] import Container
>>> issubclass(list, Container)
True
>>> issubclass(tuple, Container)
True
>>> issubclass(set, Container)
True
>>> issubclass(dict, Container)
True
So adding that to our code, it becomes:
from [Link] import Container
def print_stuff(stuff):
if isinstance(stuff, Container):
for item in stuff:
print(item)
else:
print(stuff)
This style of checking for types actually has a name: it’s called “duck
typing”.
callable and duck typing basics
Famously, Python is referred to as a “duck-typed” language. What it
means is that instead of caring about the exact class an object
comes from, Python code generally tends to check instead if the
object can satisfy certain behaviours that we are looking for.
In the words of Alex Martelli:
“You don’t really care for IS-A — you really only care for BEHAVES-
LIKE-A-(in-this-specific-context), so, if you do test, this behaviour is
what you should be testing for.
In other words, don’t check whether it IS-a duck: check whether it
QUACKS-like-a duck, WALKS-like-a duck, etc, etc, depending on
exactly what subset of duck-like behaviour you need to play your
language-games with.”
To explain this, I’ll give you a quick example:
Some items in Python can be “called” to return a value, like
functions and classes, while others can’t, and will raise a
TypeError if you try:
>>> def magic():
... return 42
...
>>> magic() # Works fine
42
>>> class MyClass:
... pass
...
>>> MyClass() # Also works
<__main__.MyClass object at 0x7f2b7b91f0a0>
>>> x = 42
>>> x() # Doesn't work
TypeError: 'int' object is not callable
How do you even begin to check if you can try and “call” a function,
class, and whatnot? The answer is actually quite simple: You just
see if the object implements the __call__ special method.
>>> def is_callable(item):
... return hasattr(item, '__call__')
...
>>> is_callable(list)
True
>>> def function():
... pass
...
>>> is_callable(function)
True
>>> class MyClass:
... pass
...
>>> is_callable(MyClass)
True
>>> is_callable('abcd')
False
And that’s pretty much what the callable builtin does:
>>> callable(list)
True
>>> callable(42)
False
By the way, these “special methods” is how most of Python’s syntax
and functionality works:
• x() is the same as doing x.__call__()
• items[10] is the same as doing items.__getitem__(10)
• a + b is the same as doing a.__add__(b)
Nearly every python behavior has an underlying “special method”,
or what they’re sometimes called as, “dunder method” defined
underneath.
If you want to read more into these dunder methods, you can read
the documentation page about Python’s data model.
sorted and reversed : Sequence manipulators
Sorting and reversing a sequence of data are probably the most
used algorithmic operations in any programming language. And the
top level sorted and reversed let you do exactly that.
• sorted This function sorts the incoming data, and returns a
sorted list type.
>>> items = (3, 4, 1, 2)
>>> sorted(items)
[1, 2, 3, 4]
It uses the “TimSort” algorithm created by by Tim Peters, one of
the earliest Python wizards.
There’s also two other parameters that sorted can take:
reverse , which when set to True sorts the data in reverse
order; and key , which takes in a function that is used on every
element to sort the data based on a custom property of each
item. Let’s take a look at it:
>>> items = [
... {'value': 3},
... {'value': 1},
... {'value': 2},
... ]
>>> sorted(items, key=lambda d: d['value'])
[{'value': 1}, {'value': 2}, {'value': 3}]
>>> names = ['James', 'Kyle', 'Max']
>>> sorted(names, key=len) # Sorts by name length
['Max', 'Kyle', 'James']
Also note, that while [Link]() is already one way to sort
lists, the .sort() method only exists on lists, while sorted
can take any iterable.
• reversed
reversed is a function that takes in any sequence type and
returns a generator, which yields the values in reversed order.
Returning a generator is nice, as this means that reversing
certain objects takes no extra memory space at all, like range
or list , whose reverse values can be generated one by one.
>>> items = [1, 2, 3]
>>> x = reversed(items)
>>> x
<list_reverseiterator object at 0x7f1c3ebe07f0>
>>> next(x)
3
>>> next(x)
2
>>> next(x)
1
>>> next(x)
StopIteration # Error: end of generator
>>> for i in reversed(items):
... print(i)
...
3
2
1
>>> list(reversed(items))
[3, 2, 1]
map and filter : Functional primitives
Now in Python, everything might be an object, but that doesn’t
necessarily mean that your Python code needs to be object-
oriented. You can in-fact write pretty easy to read functional code
in Python.
If you don’t know what functional languages or functional code is,
the idea is that all functionality is provided via functions. There isn’t
a formal concept of classes and objects, inheritance and the like. In
essence, all programs simply manipulate pieces of data, by passing
them to functions and getting the modified values returned back to
you.
This might be an oversimplification, don’t dwell too much on my
definition here. But we’re moving on.
Two really common concepts in functional programming are map
and filter, and Python provides builtin functions for those:
• map
map is a “higher order function”, which just means that it’s a
function that takes in another function as an argument.
What map really does is it maps from one set of values to
another. A really simple example would be a square mapping:
>>> def square(x):
... return x * x
...
>>> numbers = [8, 4, 6, 5]
>>> list(map(square, numbers))
[64, 16, 36, 25]
>>> for squared in map(square, numbers):
... print(squared)
...
64
16
36
25
map takes two arguments: a function, and a sequence. It simply
runs that function with each element as input, and it stores all
the outputs inside a new list. map(square, numbers) took
each of the numbers and returned a list of squared numbers.
Note that I had to do list(map(square, numbers)) , and this
is because map itself returns a generator. The values are lazily
mapped one at a time as you request them, e.g. if you loop over
a map value, it will run the map function one by one on each
item of the sequence. This means that map doesn’t store a
complete list of mapped values and doesn’t waste time
computing extra values when not needed.
• filter
filter is quite similar to map , except it doesn’t map every
value to a new value, it filters a sequence of values based on a
condition.
This means that the output of a filter will contain the same
items as the ones that went in, except some may be discarded.
A really simple example would be to filter out odd numbers
from a result:
>>> items = [13, 10, 25, 8]
>>> evens = list(filter(lambda num: num % 2 == 0, items))
>>> evens
[10, 8]
A few people might have realised that these functions are
essentially doing the same thing as list comprehensions, and
you’d be right!
List comprehensions are basically a more Pythonic, more
readable way to write these exact same things:
>>> def square(x):
... return x * x
...
>>> numbers = [8, 4, 6, 5]
>>> [square(num) for num in numbers]
[64, 16, 36, 25]
>>> items = [13, 10, 25, 8]
>>> evens = [num for num in items if num % 2 == 0]
>>> evens
[10, 8]
You are free to use whichever syntax seems to suit your
usecase better.
len , max , min and sum : Aggregate functions
Python has a few aggregate functions: functions that combine a
collection of values into a single result.
I think just a little code example should be more than enough to
explain these four:
>>> numbers = [30, 10, 20, 40]
>>> len(numbers)
4
>>> max(numbers)
40
>>> min(numbers)
10
>>> sum(numbers)
100
Three of these can infact take any container data type, like sets,
dictionaries and even strings:
>>> author = 'guidovanrossum'
>>> len(author)
14
>>> max(author)
'v'
>>> min(author)
'a'
sum is required to take in a container of numbers. Which means,
this works:
>>> sum(b'guidovanrossum')
1542
I’ll leave that to you to figure out what happened here ;)
iter and next : Advanced iteration
iter and next define the mechanism through which a for loop
works.
A for loop that looks like this:
for item in mylist:
print(item)
is actually doing something like this internally:
mylist_iterable = iter(mylist)
while True:
try:
item = next(mylist_iterable)
print(item)
except StopIteration:
break
A for-loop in Python is a cleverly disguised while loop. When you
iterate over a list, or any other datatype that supports iteration, it
just means that it understands the iter function, and returns an
“iterator” object.
Iterator objects in Python do two things:
• They yield new values everytime you pass them to next
• They raise the StopIteration builtin exception when the
iterator has run out of values.
This is how all for loops work.
BTW, generators also follow the iterator protocol:
>>> gen = (x**2 for x in range(1, 4))
e t(ge )
1
>>> next(gen)
4
>>> next(gen)
9
>>> next(gen)
Error: StopIteration
range , enumerate and zip : Convenient iteration
You already know about range . It takes in upto 3 values, and
returns an iterable that gives you integer values:
>>> list(range(10))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> list(range(3, 8))
[3, 4, 5, 6, 7]
>>> list(range(1, 10, 2))
[1, 3, 5, 7, 9]
>>> list(range(10, 1, -2))
[10, 8, 6, 4, 2]
But enumerate and zip are actually really useful as well.
enumerate is great for when you need to access the index and
value of elements in a list.
Instead of doing:
>>> menu = ['eggs', 'spam', 'bacon']
>>> for i in range(len(menu)):
... print(f'{i+1}: {menu[i]}')
...
1: eggs
2: spam
3: bacon
You can do this instead:
>>> menu = ['eggs', 'spam', 'bacon']
>>> for index, item in enumerate(menu, start=1):
... print(f'{index}: {item}')
...
1: eggs
2: spam
3: bacon
Similarly, zip is used to get index-wise values from multiple
iterables.
Instead of doing:
>>> students = ['Jared', 'Brock', 'Jack']
>>> marks = [65, 74, 81]
>>> for i in range(len(students)):
... print(f'{students[i]} got {marks[i]} marks')
...
Jared got 65 marks
Brock got 74 marks
Jack got 81 marks
You can do:
>>> students = ['Jared', 'Brock', 'Jack']
>>> marks = [65, 74, 81]
>>> for student, mark in zip(students, marks):
... print(f'{student} got {mark} marks')
...
Jared got 65 marks
Brock got 74 marks
Jack got 81 marks
Both can help massively simplify iteration code.
slice
A slice object is what’s used under the hood when you try to slice
a Python iterable.
In my_list[1:3] for example, [1:3] is not the special part, only
1:3 is. The square brackets are still trying to index the list! But
1:3 inside these square brackets here actually creates a slice
object.
This is why, my_list[1:3] is actually equivalent to
my_list[slice(1, 3)] :
>>> my_list = [10, 20, 30, 40]
>>> my_list[1:3]
[20, 30]
>>> my_list[slice(1, 3)]
[20, 30]
>>> nums = list(range(10))
>>> nums
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> nums[1::2]
[1, 3, 5, 7, 9]
>>> s = slice(1, None, 2) # Equivalent to `[1::2]`
>>> s
slice(1, None, 2)
>>> nums[s]
[1, 3, 5, 7, 9]
If you want to learn a bit more about slices, how they work and
what all can be done with them, I cover that in a separate article
here.
breakpoint : built-in debugging
breakpoint was a builtin that was added to Python 3.7, as an
easier way to drop into a debugging session. Essentially it just calls
set_trace() from the pdb module, which is the debugger
module that is built into Python.
What pdb lets you do is stop the execution of your code at any
moment, inspect the values of variables, run some code if you like,
and then you can even do fancy things like running the code one
line at a time, or check the state of the stack frames inside the
interpreter.
Using pdb to debug your code, by slowly going over it, seeing
which lines of code get executed, and inspecting values of objects
and variables is a much more efficient way to debug your code
than using print statements.
Unfortunately there isn’t any good way to show a debugger being
used in a text-format in a blog. But, AnthonyWritesCode has a
really good video explaining some of its features if you’re
interested.
open : File I/O
open is the function that lets you read and write to files.
It’s… actually rather straightforward, and there aren’t any obscure
things about it that I can explain, so I’m not even going to bother
with it. You can read the official docs about reading and writing
files if you’d like to know more.
repr : Developer convenience
repr is an interesting one. Its intended use-case is simply to help
the developers.
repr is used to create a helpful string representation of an object,
hopefully one that concisely describes the object, and its current
state. The intent of this is to be able to debug simple issues simply
by looking at the object’s repr, instead of having to probe into ints
attributes at every step.
Here’s a good example:
>>> class Vector:
... def __init__(self, x, y):
... self.x = x
... self.y = y
...
>>> v = Vector(3, 5)
>>> v
<__main__.Vector object at 0x7f27dff5a1f0>
The default repr is not helpful at all. You’d have to manually check
for its attributes:
>>> dir(v)
['__class__', ... , 'x', 'y']
>>> v.x
3
>>> v.y
5
But, if you implement a friendly repr to it:
>>> class Vector:
... def __init__(self, x, y):
... self.x = x
... self.y = y
... def __repr__(self):
... return f'Vector(x={self.x}, y={self.y})'
>>> v = Vector(3, 5)
>>> v
Vector(x=3, y=5)
Now you don’t need to wonder what this object contains. It’s right
in front of you!
help , exit and quit : site builtins
Now, these builtins aren’t real builtins. As in, they aren’t really
defined in the builtins module. Instead, they are defined in the
site module, and then injected into builtins when site module
runs.
site is a module that is automatically run by default when you
start Python. It is responsible for setting up a few useful things,
including making pip packages available for import, and setting up
tab completion in the REPL, among other things.
One more thing that it does is setup these few useful global
functions:
• help is used to find documentation of modules and objects.
It’s equivalent to calling [Link]() .
• exit and quit quit the Python process. Calling them is
equivalent to calling [Link]() .
copyright , credits , license : Important texts
These three texts are also defined by the site module, and typing
them in the REPL prints out their text, with license() being an
interactive session.
So what’s next?
Well, here’s the deal. Python is huge.
Here’s just a few things that we haven’t even touched upon yet:
• Threading / Multiprocessing
• Asynchoronous computation
• Type annotations
• Metaclasses
• Weak references
• The 200 or so builtin modules that do everything from html
templating, to sending emails, to cryptography.
And that’s probably not even all of it.
But, the important thing is that you know a LOT about Python’s
fundamentals now. You know what makes Python tick, you
understand its strengths.
The rest of the things you can pick up as you go, you just need to
be aware that they exist!
The official Python tutorial has a section on the builtin modules,
and the documentation around all of them is actually really good.
Reading that whenever you need it will pretty much help you figure
out everything as you need it.
There’s also more than 300 detailed videos made by
AnthonyWritesCode that are really informative.
So now that you’ve learned all of this, why don’t you build
something great?
The end
Thanks a lot for reading this article. If you managed to read the
whole thing, congratulations! And I’d love to hear your thoughts on
it ✨
Subscribe to my newsletter:
Your email...
Subscribe
17 Comments - powered by [Link]
deepamgupta commented 3 weeks ago
Insightful
While reading, it seems like someone is explaining to me. 🗣
I bookmarked this blog. 🔖
I know I will have to refer this again and again. 😄
❤️ 2
nicwolff commented 3 weeks ago
Correction at [Link]
MyClass = type('MyClass', (MySuperClass), {'x': x_function})
should be
MyClass = type('MyClass', (MySuperClass,), {'x': x_function})
since the second param is a tuple and commas – not parentheses! – make tuples.
tusharsadhwani commented 3 weeks ago Owner
@nicwolff of course I missed that -- Thank you!
pastorenick commented 3 weeks ago
While learning "built-ins" I discovered so many answers to Python questions I had since
forever. Bookmarking your blog for sure I will!
❤️ 1
creacha commented 3 weeks ago
Spent a day reading at my leisure, and learned a bunch. Thanks for putting it all together!
❤️ 1
davidchoo12 commented 3 weeks ago
Very interesting, learnt a lot. I've known and used dir() before to view all the builtins names
but never really explored all that deeply into each of them until this blog.
❤️ 1
pedrojunqueira commented 2 weeks ago
This is the best Python blog post I ever read. I recommended to all my network in LinkedIn.
Well done Tushar Sadhwani.
[Link]
❤️ 1
yaowenqiang commented 2 weeks ago
find a typo, in the extra:slots code example n SlottedClass() should be s SlottedClass()
👍 1
tusharsadhwani commented 2 weeks ago Owner
@yaowenqiang Good eye, it's been fixed. Thanks!
EltonCarreiro commented 2 weeks ago
Wow! Thanks a lot for this article, can't say how helpful this was to give me a better
understanding of the python world!
❤️ 1
Itanod commented 2 weeks ago
I've been writing, reading and loving code for more than 3 decades. This is one of the best
coding articles I've read. You have a very rare talent. Keep going. Thanks.
❤️ 1
MichalRyszardWojcik commented a week ago
I'd like to use this resource extendsively during my lectures showing my screen to the students.
For this purpose it would be very helpful if the styling could be adjusted to black ink on white
background without any frills. Thank you for an excellent reasource!
tusharsadhwani commented a week ago Owner
@MichalRyszardWojcik I can work on adding a print stylesheet to the blog. It would mean you
can press Ctrl+P, and it will generate a properly styled PDF with a white background. Would
that work?
MichalRyszardWojcik commented a week ago
I'm not sure until I see it working. But if you create a printing style,
why don t you apply this style to the page itself as an option? PDF viewing
adds another layer of blur.
…
robrichter commented 3 days ago
Really great article. I am looking forward to next one :). Thanks a lot.
tusharsadhwani commented 3 days ago Owner
@MichalRyszardWojcik I've adjusted the light theme to use proper colors now. Can you check
and see if it's any better for your use, or suggest changes?
tusharsadhwani commented 3 days ago Owner
@robrichter next in-depth one should come out in a couple weeks :)
Write Preview
Sign in to comment
Follow for Python tips Join the community
Check source code View the main site