Python pip Tips for Data Scientists

[This article was first published on Python – Predictive Hacks, and kindly contributed to python-bloggers]. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

Data Scientists use to work with Anaconda Environments and for installing packages they use to run the “conda” commands. However, apart from conda there is the “pip” package manager that is still the most popular. Although these two package managers are very similar, they are designed for different purposes and should be used accordingly. In this tutorial, we will show you some tips about pip that you are going to apply to your daily tasks.

What is pip?

According to Wikipedia, pip is a package-management system written in Python used to install and manage software packages. It connects to an online repository of public packages, called the Python Package Index. pip can also be configured to connect to other package repositories, provided that they comply to Python Enhancement Proposal 503.

pip Tips

Install packages

I think that most of you know how to install packages using pip which is simply by running the command:

pip install some-package-name

If you would like to install a specific version you can run:

pip install 'some-package-name==1.2.2' --force-reinstall

Where 1.2.2 is the version of the package. We can add the flag –force-reinstall in case we want to re-install the package if it is already installed. Moreover, you can give a range of versions like:

pip install 'some-package-name>=1.3.0,<1.4.0' --force-reinstall

Finally, you can install packages for a specific python version. For example, if we want for python 3, we can run

pip3 install some-package-name

Uninstall packages

We can easily remove a package by running:

pip uninstall some-package-name

Install packages from the requirements

We have explained how to create the requirements.txt file. Let’s assume that the requirements.txt is the file below:

pandas==1.2.5
numpy==1.21.1

We can install these libraries by running:

pip install -r requirements.txt

Generate the requirements.txt file

Usually, we work with virtual environments and once we have installed the required libraries, we can easily generate the requirements.txt file using pip.

pip freeze > requirements.txt

Get the installed packages

Using pip, we can get a list of the installed packages in our environment by running:

pip list
Python pip Tips for Data Scientists 1

You can search for a specific package using the list and the grep command. Let’s get my pandas version.

pip list | grep pandas
pandas                             1.2.5

Check for compatibility issues

When we install packages, it is common to have compatibility issues with dependencies and so on. We can check if everything is OK by running:

pip check

If I run it at my base environment, I get the following:

streamlit 0.86.0 requires protobuf, which is not installed.
spyder 4.2.5 requires pyqt5, which is not installed.
spyder 4.2.5 requires pyqtwebengine, which is not installed.
qdarkstyle 2.8.1 requires helpdev, which is not installed.
conda-repo-cli 1.0.4 requires pathlib, which is not installed.
anaconda-project 0.10.1 requires ruamel-yaml, which is not installed.
awswrangler 2.9.0 has requirement numpy<1.21.0,>=1.18.0, but you have numpy 1.21.1.
awswrangler 2.9.0 has requirement pyarrow<4.1.0,>=2.0.0, but you have pyarrow 5.0.0.
awscli 1.20.12 has requirement botocore==1.21.12, but you have botocore 1.20.112.
awscli 1.20.12 has requirement colorama<0.4.4,>=0.2.5, but you have colorama 0.4.4.
awscli 1.20.12 has requirement docutils<0.16,>=0.10, but you have docutils 0.17.1.
awscli 1.20.12 has requirement s3transfer<0.6.0,>=0.5.0, but you have s3transfer 0.4.2.

Apparently, I have some work to do!

Show more info about packages

We can get more information about an installed package by running

pip show some-package-name

For example, this is what I get for pandas.

Name: pandas
Version: 1.2.5
Summary: Powerful data structures for data analysis, time series, and statistics
Home-page: https://pandas.pydata.org
Author:
Author-email:
License: BSD
Location: c:\users\gpipis\anaconda3\lib\site-packages
Requires: pytz, numpy, python-dateutil
Required-by: streamlit, statsmodels, seaborn, mlxtend, awswrangler, altair

The Takeaway

Data Scientists and/or Data Engineers work with Python on a daily basis and as a result, a basic knowledge of “pip” is a really useful tool for their work. That was an introduction to pip, I encourage you to dive into pip and unlock its power, and feel free to share your tips with our community.

To leave a comment for the author, please follow the link and comment on their blog: Python – Predictive Hacks.

Want to share your content on python-bloggers? click here.