First Steps in Python Testing

Posted on September 5, 2024 by The Jumping Rivers Blog in Data science | 0 Comments

This article was first published on The Jumping Rivers Blog , and kindly contributed to python-bloggers. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

Programming is a craft, and in data science we often spend countless hours coding. There isn’t a
magic shortcut to improving your programming skills. But, like any craft, improvement comes from
practice: challenging yourself, exploring related skills, learning from others, and teaching.

Testing code using automated tools is common throughout the software development industry. This
technique can improve the quality of the code you write as a data scientist. Testing helps refine
your code, supports redesign, prevents errors, and makes it harder to write single-use code.

Here, we introduce the pytest framework and show how it can be used to test Python functions. If you
don’t use a testing framework as part of your daily workflow, try experimenting with the techniques
presented here the next time you write or extend a function.

About pytest

pytest is a software testing framework, it is a command-line
tool that automatically finds tests you’ve written, runs the tests, and reports the results. In
general, pytest is known for its simplicity, scalability, and powerful features such as fixture
support and parameterization, it has a concise syntax and a rich plugin ecosystem compared to python
standard libraries.

Getting started with pytest

Before we start writing tests, it’s important to set up a clean, isolated environment where we can
install and manage packages.
This is done using a virtual environment.

We first navigate to the project directory and then create a virtual environment for our project.
Then we activate the virtual environment as in the second row of the code, and install pytest.

$ python3 -m venv venv
$ source venv/bin/activate
(venv) $ pip install pytest

We have everything set up to use pytest in our project.
When we are done working in the virtual environment, we can deactivate it by simply running:

$ deactivate

Now that your environment is set up, let’s explore the basics of pytest.

What is a test?

A test is a small piece of code (usually a function) that checks whether another piece of code is
working as expected.
For example, imagine you wrote a function to calculate the mean of a list of numbers. A test would
check if the function correctly computes the mean for different inputs.

Let’s create a simple function that calculates the mean of a list of numbers x and save it in my_functions.py:

# ./my_functions.py
def calculate_mean(x):
 return sum(x) / len(x)

A very nice property of pytest is something called test discovery, a series of naming conventions
that tell pytest how to go and search for tests and execute them.
Any file that contains test functions should start with test_ and also the tests functions in this
file should be named in the same way.
Then, pytest will automatically search and find these functions and run them.

Now, let us write a test for this function using pytest.
Create a file named test_my_functions.py:

# ./test_my_functions.py
from my_functions import calculate_mean


def test_calculate_mean():
 x = [1, 2, 3, 4, 5]
 result = calculate_mean(x)
 expected = 3.0
 assert result == expected

In this example, test_calculate_mean() is a test function.
It checks if calculate_mean([1, 2, 3, 4, 5]) returns 3.0.
When we run pytest, it will check if the assert statement holds true.

$ pytest test_my_functions.py

============================= test session starts ==============================
test_my_functions.py . [100%]

============================== 1 passed in 0.01s ===============================

We can see that the test has successfully passed. In the output, the dot $(.)$ after
test_my_functions.py indicates that the test has passed.

Now, let’s have a look at an example of a failing test.
Consider the following test function which is in the file test_failing.py.

# ./test_failing.py
def test_addition():
 result = 2 + 2
 expected = 5
 assert result == expected

We run pytest from the command-line and investigate the output.

$ pytest -v test_failing.py
============================= test session starts ==============================
test_failing.py::test_addition FAILED [100%]

=================================== FAILURES ===================================
________________________________ test_addition _________________________________

 def test_addition():
 result = 2 + 2
 expected = 5
> assert result == expected
E assert 4 == 5

test_failing.py:4: AssertionError
=========================== short test summary info ============================
FAILED test_failing.py::test_addition - assert 4 == 5
============================== 1 failed in 0.03s ===============================

This time pytest provides us with a message giving information on the error and also highlights any
reasons that have caused the test to fail. The -v or --verbose command-line flag is used to
reveal more verbose output.

The assert statement

The assert statement is used to verify that a given condition is True.
If the condition is False, the test fails.
In our first example the statement, assert result == expected
asserts that the result from calculate_mean(x)
should equal 3.0. If the assert statement is not true, pytest reports a failure.

Pytest fixtures

Suppose you had written several functions that all work on some non-trivial dataset,
and you want to write a test-function for each. In each test-function, you would
have to create a dataset of the required form, pass it into the function-under-test,
and then compare the output to some expected value. The code for creating a
test-dataset may get duplicated between the different test-functions.

Fixtures in pytest are helper functions which are
used to set up conditions that we want to be available for multiple tests. This might involve
putting together some test data, or preparing some other state before a test runs (connecting to a
database, creating a temporary file). Fixtures are run before (and sometimes after) the actual test
functions. The @pytest.fixture decorator is used to tell pytest that a function is a fixture.
Fixtures can perform actions (like setting up a database connection), and can inject data into a
test function.

To illustrate let us consider a fixture that provides us with a list of numbers
in our test file test_my_functions.py:

# ./test_my_functions.py
import pytest

from my_functions import calculate_mean


@pytest.fixture
def sample_numbers():
 return [1, 2, 3, 4, 5]


def test_calculate_mean(sample_numbers):
 result = calculate_mean(sample_numbers)
 expected = 3.0
 assert result == expected

By using @pytest.fixture, we have defined a sample_numbers fixture that returns the list
[1, 2, 3, 4, 5].
This fixture can be used in any test function by adding it as an argument. Fixtures are especially
useful when you need to set up more complex objects that multiple tests will use.

The test output would be:

$ pytest -vv test_my_functions.py

============================= test session starts ==============================
test_my_functions.py::test_calculate_mean PASSED [100%]

============================== 1 passed in 0.00s ===============================

Parametrization

Parametrization is an
important feature of pytest which allows us to run a test with multiple sets of parameters.
This is helpful when we want to check the same logic under different conditions without writing
separate test functions.

Here is how we can test calculate_mean from the test_my_functions.py file, by considering
multiple inputs using parametrization:

# ./test_my_functions.py
import pytest

from my_functions import calculate_mean


@pytest.mark.parametrize("numbers, expected", [
 ([1, 2, 3, 4, 5], 3.0),
 ([10, 20, 30], 20.0),
 ([7, 14, 21], 14.0),
 ([5, 5, 5, 5], 5.0),
])
def test_calculate_mean_parametrized(numbers, expected):
 result = calculate_mean(numbers)
 assert result == expected

In this example, @pytest.mark.parametrize allows us to test calculate_mean with four different
lists.
Each tuple in the list passed to parametrize represents a different test case with its own numbers
and expected values.

Then to run the test we use:

$ pytest -v test_my_functions.py

========================================= test session starts =========================================
test_my_functions.py::test_calculate_mean_parametrized[numbers0-3.0] PASSED [ 25%]
test_my_functions.py::test_calculate_mean_parametrized[numbers1-20.0] PASSED [ 50%]
test_my_functions.py::test_calculate_mean_parametrized[numbers2-14.0] PASSED [ 75%]
test_my_functions.py::test_calculate_mean_parametrized[numbers3-5.0] PASSED [100%]

========================================== 4 passed in 0.01s ==========================================

The output is slightly different here because we are testing for different scenarios and the result
is given for each of them.

Test organization

In the above, our test scripts (test_my_functions.py and test_failing.py) and python modules
(my_functions.py) were all in the same directory. We used this approach for simplicity (as our
focus was on how to write and run tests). In a larger project you may have many test scripts and
python modules, and this approach will quickly become difficult to manage.

To keep your project organised, it’s a good practice to place all tests in a tests/ directory.
This way, when we run pytest we receive a summary of all the project’s tests. On making this change,
the file structure for the above example is:

./intro-to-python/
├── my_functions.py
├── tests/
│ ├── test_failing.py
│ └── test_my_functions.py
└── venv/

However, there is a small problem here. The my_functions.py module must be imported by the
test_my_functions.py test script. But if we call pytest tests/ from the project root,
my_functions.py isn’t automatically included in the python search path (a collection of
directories from which packages and modules can be imported by the running python session) so it
can’t be imported by test_my_functions.py.

A simple solution for this is to use the following command instead of pytest tests/:

$ python -m pytest tests/

When we call python directly, any python modules in the current directory are made available on
the python search path.

A more robust solution (and one we would recommend for larger projects) is to place your python
modules in a package structure, though that is beyond the scope of this introduction to pytest.

Ready to start testing your code? Enjoy your journey into Python testing, and happy coding!

For updates and revisions to this article, see the original post

To leave a comment for the author, please follow the link and comment on their blog: The Jumping Rivers Blog .

Want to share your content on python-bloggers? click here.

Python-bloggers

Data science news and tutorials - contributed by Python bloggers