Stylising your Python code: An introduction to linting and formatting

Posted on May 26, 2022 by The Jumping Rivers Blog in Data science | 0 Comments

This article was first published on The Jumping Rivers Blog , and kindly contributed to python-bloggers. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

https://xkcd.com/1513

Linting is a process for identifying bugs and stylistic errors in your code. The process is carried out by analysis tools called ‘linters’, which are widely available for every major programming language. Linters will flag issues and style violations in your code, sort of like a spell checker!

In addition to linters, there are a wide range of ‘auto-formatters’ that can also carry out these checks, and even make the necessary changes for you.

In this post we will provide an introductory overview of popular linters and auto-formatters for Python.

Why should I care?

Put simply, linting helps to ensure that the format and style of your code adheres to the best coding practices. A nice thing about Python is that there is a clearly defined set of guidelines for code formatting and styling which most linters adhere to. These guidelines are laid out in PEP8, which is a Python Enhancement Proposal (PEP) written in 2001 to describe how Python developers can write readable and consistent code.

Whether or not you intend to share your code, there are lots of reasons why you should care!

Readability: It goes without saying that if you plan to share your code with colleagues or make it publicly available, it’s got to be readable. Even if you’re working on it solo, you will be thankful in the long-run that you took the time to write clear, logical code. This will save a lot of head-scratching when you return to it later!
Debugging: A really nice feature of linters is the ability to flag bugs in your code without needing to run it (static analysis). Plus, readable code is much easier to debug!
Consistency: In a large coding project consisting of many scripts, it helps to use a consistent style throughout. This can be especially challenging when working with a large team. Incorporating linters into your workflow (pre-commit, etc) will be a big help!
Self-improvement: Getting into the habit of regularly checking your code for stylistic errors will make you a better programmer. Over time you will find that you are becoming less reliant on linters!

This all sounds great!

Data comes in all shapes and sizes. It can often be difficult to know where to start. Whatever your problem, Jumping Rivers can help.

Linters for Python

We will look at a couple of well-known Python linters:

Pylint: looks for errors, enforces a coding standard that is close to PEP8, and even offers simple refactoring suggestions.
Flake8: wrapper around PyFlakes, pycodestyle and McCabe; this will check Python source code for errors and violations of some of the PEP8 style conventions.

It should be noted that Flake8 does not, by default, look for as many PEP8 violations as Pylint (unless you install some plugins). However it can still be beneficial to work with both linters in your project, as we will show below.

Examples

So now we know what linters are, let’s see how to use them in our projects!

Linting a Python script

For this example we will look at how to lint the following piece of code:

import numpy as np
import time
import pandas as pd

Captain='Picard'

def InitiateWarpSpeed(order):
    if order=="engage":
        print("initiating warp speed")
    else:
        print("you are not the captain of this vessel")

InitiateWarpSpeed("engage")

Pylint:

Let’s start with Pylint. We can install this with:

pip install pylint

Conventionally, Pylint is used to analyse a Python module. However, it is also possible to run it on an individual script with:

pylint my_script.py

The output looks something like this:

my_script.py:13:0: C0304: Final newline missing (missing-final-newline)
my_script.py:1:0: C0114: Missing module docstring (missing-module-docstring)
my_script.py:3:0: E0401: Unable to import 'pandas' (import-error)
my_script.py:5:0: C0103: Constant name "Captain" doesn't conform to UPPER_CASE naming style (invalid-name)
my_script.py:7:0: C0103: Function name "InitiateWarpSpeed" doesn't conform to snake_case naming style (invalid-name)
my_script.py:7:0: C0116: Missing function or method docstring (missing-function-docstring)
my_script.py:1:0: W0611: Unused numpy imported as np (unused-import)
my_script.py:2:0: W0611: Unused import time (unused-import)
my_script.py:3:0: W0611: Unused pandas imported as pd (unused-import)
my_script.py:2:0: C0411: standard import "import time" should be placed before "import numpy as np" (wrong-import-order)

We can see it has flagged some issues with our code. The format with which Pylint displays these messages is:

{path}:{line}:{column}: {msg_id}: {msg} ({symbol})

The letter at the start of the message ID indicates the category of the check that has failed. For example, C refers to a convention related check and E to an error. The full list of categories can be found in the Pylint documentation. One thing to note is that Pylint is telling us with

my_script.py:3:0: E0401: Unable to import 'pandas' (import-error)

that there is a bug in line three which will cause an error, and it is telling us this before our code has even run!

If for some reason we decide we want to overrule Pylint and ignore a message for a line of code, we can include the comment # pylint: disable=some-message. For example, if we really wanted to keep our naming of variable Captain against the style guide, we could change the line to:

Captain = 'Picard' # pylint: disable=invalid-name

So, linting your scripts with Pylint is a breeze, and it turns out Flake8 is just as easy to use!

Flake8:

This can be installed with

pip install flake8

and run using

flake8 my_script.py

In fact, you don’t even have to specify a Python script here! Simply running flake8 will lint all scripts within the current directory and all sub-directories.

This time, the output is:

my_script.py:1:1: F401 'numpy as np' imported but unused
my_script.py:2:1: F401 'time' imported but unused
my_script.py:3:1: F401 'pandas as pd' imported but unused
my_script.py:5:8: E225 missing whitespace around operator
my_script.py:7:1: E302 expected 2 blank lines, found 1
my_script.py:8:13: E225 missing whitespace around operator
my_script.py:13:1: E305 expected 2 blank lines after class or function definition, found 1
my_script.py:13:28: W292 no newline at end of file

This differs somewhat with the output from Pylint:

Flake8 is flagging lots of issues related to whitespace and blank lines;
Pylint is identifying violations with naming conventions and layout (docstrings, import order, etc);
Both linters are pointing out unused imports.

You may prefer one of these linters over the other, or you could be extra-diligent and opt to work with both linters for your project!

If you want Flake8 to ignore a particular line of code, you can just add a comment # noqa at the end. To ignore a particular error, you can use, for example, # noqa: F401 to ignore an unused import.

You can also configure Flake8 so that it will only flag particular errors. One way to do this is by adding a setup.cfg file to your working directory. Let’s say you want to:

set the maximum line length to be 88;
ignore the E302 blank line flags;
ignore the F401 flag for my_script.py only.

The contents of setup.cfg would then be:

[flake8]
max-line-length = 88
extend-ignore =
    E302,
per-file-ignores =
    my_script.py:F401

Running Flake8 then gives a reduced output:

my_script.py:5:8: E225 missing whitespace around operator
my_script.py:8:13: E225 missing whitespace around operator
my_script.py:13:1: E305 expected 2 blank lines after class or function definition, found 1
my_script.py:13:28: W292 no newline at end of file

Linting in an editor

In the last example we showed how to lint Python scripts from the command line. However, we might want to see potential issues with our code as we are writing it, enabling us to correct things instantly. In order to do this we can configure a linter with a text editor. In this example we will go through how to do this for VSCode.

In VSCode we can set our linter preference by opening the command palette with Ctrl+Shift+P and clicking on Python: Select Linter.

We can then select which linter we want to use. If ‘Pylint’ is selected, for example, the setting

"python.linting.pylintEnabled": true

will then be added to the settings.json file in the .vscode config.

Potential issues will now be underlined upon saving our script, similar to a spell checker:

linted code

If you hover over a line, the message associated with this problem will be displayed. The full list of issues can also be viewed in the “PROBLEMS” bar of the VSCode terminal window.

Linting a Jupyter Notebook

Jupyter notebooks can be a great tool for learning, running experiments and checking pieces of code. However, they do pose some difficulties when it comes to version control and running checks such as linting and formatting.

An easy way to apply linters to Jupyter notebooks is with the nbqa package. This can be installed with

pip install nbqa

conda install -c conda-forge nbqa

This enables you to then run a range of code styling tools on notebooks in a similar way to scripts. For example, to use Pylint on a notebook you simply have to run:

nbqa pylint my_notebook.ipynb

Note, you will need to separately install any tool you want to use with nbqa.

Auto-formatters in Python

Linters are perfectly fine for dealing with imperfections for which there is a clear and simple fix, like renaming a variable from CamelCase to snake_case. But they would not be able to, for example, split a long line of code into several shorter lines. Instead, this can be done with an auto-formatter, which can change your code to follow certain formatting guidelines. These guidelines dictate things such as where tabs, spaces and new lines are used in code.

We will consider a popular auto-formatter called Black. Black reformats entire files in place, applying its own PEP8-compliant coding style which is detailed here.

Examples

Formatting a Python script

Black can be installed by running

pip install black

and run with

black my_script.py

Let’s say your script contains a long line of code, like:

long_list = ['this','list','contains','too','many','elements','for','one','line']

Black will change this to:

long_list = [
    "this",
    "list",
    "contains",
    "too",
    "many",
    "elements",
    "for",
    "one",
    "line",
]

We can see the list has been split up so each element is on a different line, making it easier to read. Furthermore, the single quotation marks have been changed to more conventional double quotations.

It should be noted that Black will only change the appearance/formatting of your code. It will not, for example, flag posssibe errors or remind you to put in a docstring.

Auto-formatting a Jupyter notebook

To format a notebook using Black you can again use the nbqa package:

nbqa black my_notebook.ipynb

We can also integrate Black with Jupyter notebooks using the Black notebook extension, nb_black. You can install this with

pip install nb_black

and then use it in a Jupyter notebook by running the following magic command in a cell:

%load_ext nb_black

Now, whenever we run a code block it will be formatted with the Black style guide!

If you want to have Black formatting enabled in your notebooks automatically (i.e. without having to run the magic command) you can set this in the ipython config. You can create an initial template ipython config by running:

ipython profile create

This by default should create some config files at the location ~/.ipython/profile_default/. In the ipython_config.py file you then need to add the lines,

c = get_config()
c.InteractiveShellApp.extensions = ["nb_black"]

Black formatting will now be enabled automatically whenever you use a Jupyter notebook.

Pre-commit hooks

So, we now know how to use linters and auto-formatters, and we have realised just how useful these are! The next step is to start enforcing their use in our projects. This can be done using pre-commit hooks. Pre-commit hooks enable us to check our code for style and formatting issues each time a change is commited, thus ensuring a uniform style is maintained throughout the entirety of a project.

The pre-commit package manager can be installed with

pip install pre-commit

In the root of our GitHub repo, we then need to create a file called .pre-commit-config.yaml. This file is where we will specify the checks we want to run before each commit. Below is an example which uses some hooks from the pre-commit-hooks repo as well as Black formatting.

repos:
-   repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v3.2.0
    hooks:
    -   id: trailing-whitespace
    -   id: end-of-file-fixer
-   repo: https://github.com/psf/black
    rev: 21.7b0
    hooks:
    -   id: black

Once we have created our .pre-commit-config.yaml file we can then run

pre-commit install

Now, whenever the command git commit is run, the pre-commit hooks will automatically be applied!

It is also possible to add pre-commit hooks for notebooks with nbqa. For example with the following pre-commit-config.yaml:

repos:
- repo: https://github.com/nbQA-dev/nbQA
  rev: 1.3.1
  hooks:
   - id: nbqa-black
     additional_dependencies: [black==21.7b0]
   - id: nbqa-pylint
     additional_dependencies: [pylint==2.13.4]

If you want to use nqba with a specific version of a tool then you can specify this in the additional_dependencies field (as above).

Python-bloggers

Data science news and tutorials - contributed by Python bloggers

Stylising your Python code: An introduction to linting and formatting

Why should I care?

Linters for Python

Examples

Linting a Python script

Linting in an editor

Linting a Jupyter Notebook

Auto-formatters in Python

Examples

Formatting a Python script

Auto-formatting a Jupyter notebook

Pre-commit hooks

Further reading

Related