Data science and data engineering are incredibly cognitively demanding professions. As data professionals, we are required to leverage both our analytical/engineering skills and our interpersonal ...
The post Burnout in Data Professionals – A Personal Take first appeared on Python-bloggers.]]>Data science and data engineering are incredibly cognitively demanding professions. As data professionals, we are required to leverage both our analytical/engineering skills and our interpersonal skills to be effective contributors within our organisations. Based on my personal experience, the field seems to concentrate humans who are detail-oriented, curious, impact-driven and tenacious to a fault. This A-type personality profile, while magical when applied to technical work, could reasonably also count as an occupational hazard.
We also have a skills shortage in our field, so many data professionals are taking on more than what is reasonable for one human to endure. It is therefore no surprise that the sexiest profession of the 21st century is also one of the professions with the highest rates of burnout.
The thing is – this field is great to work in. Extracting a clear narrative from data is one of the most satisfying things ever (I might be biassed). The minds that give us these valuable insights are the same minds that need careful tending to. Anti-burnout strategies are not only about ensuring productivity in the workplace – they are fundamental to the improved quality of life we should be striving for as a society.
I think of those in our field as not unlike high-performance athletes. Boxers wear boxing gloves. Hockey players wear shin guards. Ballet dancers warm up for hours before performances. How are we proactively protecting the minds of our data professionals?
Data comes in all shapes and sizes. It can often be difficult to know where to start. Whatever your problem, Jumping Rivers can help.
Before we go on to discuss strategies for dealing with burnout, it’s important to consider whether you are experiencing some of the symptoms.
The NHS describes those who are burnt out as:
On a personal note, I have experienced many of these symptoms during my PhD and beyond. I had this overwhelming feeling that my to-do list was simultaneously too long and also not really worth doing at all. I felt disconnected from myself and everyone in my life. Therapy was the only thing that enabled me to get a handle on my life and career path. If you are experiencing most of these symptoms on a daily basis, I highly recommend starting a journey with a trained therapist. Dealing with burnout wasn’t a pleasant experience. I’m grateful that I had the support of my therapist, family and friends. If you are experiencing this, know that you deserve the care and attention. I encourage you to make the calls you need to make.
In recovering from burnout, it was very important to me to be intentional about the next workplace I would invest my time and energy in. Eight months in, I am really enjoying the level of care that I am seeing within Jumping Rivers. There is a very strong culture of colleagues trying to make sure that their peers don’t overwork themselves. People here are very eager to help one another out where they can. We have a generous amount of leave and flexible working hours. It’s not uncommon to look at calendars and see “MOT”, “School run”, “Going for a paddle”. With this visibility, we remind one another that we work to live and don’t live to work. There is also trust from upper management that we are all trying our very best, and don’t need micromanaging. The value of this trust cannot be overstated.
I’ll finish by saying that no person is perfect.This human-level imperfection scales to organisation-level imperfection. No group of humans is going to perfectly navigate the challenges presented by modern life. What’s important to me is to try my best to bring patience, sensitivity and empathy to all areas, including the workplace. To my colleagues and employers, of course, but most importantly to myself.
If you are practising self-care for a few weeks and are still feeling overwhelmed, it might be time to go and see a mental health care professional. Unfortunately one cannot eat-pray-love oneself out of all issues.
For updates and revisions to this article, see the original post
There's more than one way to Python square roots. Learn 5 approaches to square roots in Python + some bonus advanced tricks.
After learning about 4 ways to square a number in Python, now it's time to tackle the opposite operation – Python square roots. This article will teach you five distinct ways to take square roots in Python and will finish off with a bonus section on cube roots and square roots of Python lists.
Let's get started by introducing the topic and addressing the potential issues you have to consider when calculating square roots in Python.
Put simply, the square root of a number is a value that returns the same number when multiplied by itself. It's an inverse operation of squaring.
For example, 3 squared is 9, and a square root of 9 is 3 because 3 x 3 is 9. It's a concept that's somewhat difficult to explain in a sentence, but you get the idea as soon as you see it in action.
Before diving into different ways to take square roots in Python, let's go over the sets of numbers for which you can and can't take square roots.
Square roots only play well with positive numbers. For now, ignore the code that's responsible for calculations and just focus on the results.
The following code snippet prints the square root for numbers 1 and 25:
import math a = 1 b = 25 # Square root of a positive number a_sqrt = math.sqrt(a) b_sqrt = math.sqrt(b) # Print print("Square root of a positive number") print("--------------------------------------------------") print(f"Square root of {a} = {a_sqrt}") print(f"Square root of {b} = {b_sqrt}")
Here's the output:
So, 1×1 = 1, and 5×5 = 25 – that's essentially how square roots work. But what if you were to take a square root of zero?
Now, zero is neither a prime nor composite number, so we can't find its prime factorization. For this reason, a square root of zero is zero:
import math a = 0 # Square root of a zero a_sqrt = math.sqrt(a) # Print print("Square root of a zero") print("--------------------------------------------------") print(f"Square root of {a} = {a_sqrt}")
Here's the output:
Only one use case left, and that's negative numbers.
There's no way to calculate a square root of a negative number by using real numbers. Two negative numbers multiplied will always result in a positive number.
Nevertheless, let's give it a shot:
import math a = -10 # Square root of a negative number a_sqrt = math.sqrt(a) # Print print("Square root of a negative number") print("--------------------------------------------------") print(f"Square root of {a} = {a_sqrt}")
It results in an error:
There are ways to calculate square roots of negative numbers, and that's by writing them as a multiple of -1. For instance, -9 can be written as -1 x 9. The result would be 3i. Stepping into the realm of imaginary numbers is out of the scope for today, so I'll stop here.
Next, let's go over 5 ways to tackle Python square roots.
The first method is actually the one you've seen in the previous section. It relies on the math.pow()
function to do the trick. This module ships with the default Python installation, so there's no need to install any external libraries.
Below is a code snippet demonstrating how to take square roots in Python using this function:
import math a = 1 b = 25 c = 30.82 d = 100 e = 400.40 # Method #1 - math.sqrt() function a_sqrt = math.sqrt(a) b_sqrt = math.sqrt(b) c_sqrt = math.sqrt(c) d_sqrt = math.sqrt(d) e_sqrt = math.sqrt(e) # Print print("Method #1 - math.sqrt() function") print("--------------------------------------------------") print(f"Square root of {a} = {a_sqrt}") print(f"Square root of {b} = {b_sqrt}") print(f"Square root of {c} = {c_sqrt}") print(f"Square root of {d} = {d_sqrt}") print(f"Square root of {e} = {e_sqrt}")
And here are the results:
This is probably the one and only method you'll need, but let's take a look at some alternatives as well.
If squaring a number means raising it to the power of 2, then taking a square root is essentially raising it to the power of 0.5. That's exactly the behavior you can implement with the math.pow()
function. It takes two arguments – the number and the exponent.
Let's take a look at a couple of examples:
import math a = 1 b = 25 c = 30.82 d = 100 e = 400.40 # Method #2 - math.pow() function a_sqrt = math.pow(a, 0.5) b_sqrt = math.pow(b, 0.5) c_sqrt = math.pow(c, 0.5) d_sqrt = math.pow(d, 0.5) e_sqrt = math.pow(e, 0.5) # Print print("Method #2 - math.pow() function") print("--------------------------------------------------") print(f"Square root of {a} = {a_sqrt}") print(f"Square root of {b} = {b_sqrt}") print(f"Square root of {c} = {c_sqrt}") print(f"Square root of {d} = {d_sqrt}") print(f"Square root of {e} = {e_sqrt}")
The output is identical to what we had before:
Neat, but can we eliminate library usage altogether? Sure, here's how.
The same logic from the previous function applies here. You can raise a number to the power of 0.5 with Python's exponent operator. It does the same as math.pow(x, 0.5)
, but the syntax is shorter and doesn't rely on any libraries.
Here's how to use it in Python:
a = 1 b = 25 c = 30.82 d = 100 e = 400.40 # Method #3 - Python exponent operator a_sqrt = a**0.5 b_sqrt = b**0.5 c_sqrt = c**0.5 d_sqrt = d**0.5 e_sqrt = e**0.5 # Print print("Method #3 - Python exponent operator") print("--------------------------------------------------") print(f"Square root of {a} = {a_sqrt}") print(f"Square root of {b} = {b_sqrt}") print(f"Square root of {c} = {c_sqrt}") print(f"Square root of {d} = {d_sqrt}") print(f"Square root of {e} = {e_sqrt}")
The results are once again identical, no surprises here:
Next, let's take a look at taking square roots of numbers and arrays with Numpy.
Numpy is a go-to library for numerical computations in Python. It has a sqrt()
function built-in, and you can use it to take square roots for both numbers and arrays.
Just keep in mind the return type – it will be numpy.float64
for a single number and numpy.ndarray
for the array. Each array element will be of type numpy.float64
, of course:
import numpy as np a = 1 b = 25 c = 30.82 d = 100 e = 400.40 arr = [a, b, c] # Method #4 - Numpy square roots a_sqrt = np.sqrt(a) b_sqrt = np.sqrt(b) c_sqrt = np.sqrt(c) d_sqrt = np.sqrt(d) e_sqrt = np.sqrt(e) arr_sqrt = np.sqrt(arr) # Print print("Method #4 - Numpy square roots") print("--------------------------------------------------") print(f"Square root of {a} = {a_sqrt}") print(f"Square root of {b} = {b_sqrt}") print(f"Square root of {c} = {c_sqrt}") print(f"Square root of {d} = {d_sqrt}") print(f"Square root of {e} = {e_sqrt}") print(f"Square root of {arr} = {arr_sqrt}")
Here's the console output:
This is by far the most convenient method because it relies on a widely-used Python library, and the calculation procedure is the same regardless of the data type coming in.
Remember the story of square roots and negative numbers? Python's math
module raised an error but cmath
is here to save the day. This module is used to work with complex numbers.
In the code snippet below, you'll see square roots taken from positive integers, floats, complex numbers, and negative numbers:
import cmath a = 1 b = 25.44 c = cmath.pi d = 10+10j e = -100 # Method #5 - Square roots of complex numbers a_sqrt = cmath.sqrt(a) b_sqrt = cmath.sqrt(b) c_sqrt = cmath.sqrt(c) d_sqrt = cmath.sqrt(d) e_sqrt = cmath.sqrt(e) # Print print("Method #5 - Square roots of complex numbers") print("--------------------------------------------------") print(f"Square root of {a} = {a_sqrt}") print(f"Square root of {b} = {b_sqrt}") print(f"Square root of {c} = {c_sqrt}") print(f"Square root of {d} = {d_sqrt}") print(f"Square root of {e} = {e_sqrt}")
There are no errors this time:
I've never had a need to use this module, but it's good to know it exists.
Next, let's go over some more advanced usage examples of Python square roots.
We'll now shift gears and discuss a couple of more advanced topics. These include ways to calculate cube roots in Python, and take square roots of vanilla Python lists. Let's start with the cube roots.
If taking a square root means raising a number to the power of 0.5, then the cube root must be represented by the power of 0.333, or 1/3.
Here's how to implement this logic in Python, without any external libraries:
a = 1 b = 27 c = 30.82 d = 1000 e = 400.40 # Bonus #1 - Cube roots a_cbrt = a ** (1./3.) b_cbrt = b ** (1./3.) c_cbrt = c ** (1./3.) d_cbrt = d ** (1./3.) e_cbrt = e ** (1./3.) # Print print("Bonus #1 - Cube roots") print("--------------------------------------------------") print(f"Cube root of {a} = {a_cbrt}") print(f"Cube root of {b} = {b_cbrt}") print(f"Cube root of {c} = {c_cbrt}") print(f"Cube root of {d} = {d_cbrt}") print(f"Cube root of {e} = {e_cbrt}")
The results are printed below:
Numpy provides an easier way to take cube roots in Python. It has a cbrt()
function built in, which stands for cube root. You can use it both on numbers and arrays, just as with square roots:
import numpy as np a = 1 b = 27 c = 30.82 d = 1000 e = 400.40 arr = [a, b, c] # Bonus #1.2 - Cube roots with Numpy a_cbrt = np.cbrt(a) b_cbrt = np.cbrt(b) c_cbrt = np.cbrt(c) d_cbrt = np.cbrt(d) e_cbrt = np.cbrt(e) arr_cbrt = np.cbrt(arr) # Print print("Bonus #1.2 - Cube roots with Numpy") print("--------------------------------------------------") print(f"Cube root of {a} = {a_cbrt}") print(f"Cube root of {b} = {b_cbrt}") print(f"Cube root of {c} = {c_cbrt}") print(f"Cube root of {d} = {d_cbrt}") print(f"Cube root of {e} = {e_cbrt}") print(f"Cube root of {arr} = {arr_cbrt}")
Let's take a look at the results:
Yes, it's that easy.
There's also an easy way to calculate the square root of Python lists, without Numpy. You can simply iterate over the list and take a square root of an individual list item:
import math arr = [1, 25, 30.82] arr_sqrt = [] # Bonus #2 - Square root of a Python list for num in arr: arr_sqrt.append(math.sqrt(num)) # Print # Print print("Bonus #2 - Square root of a Python list") print("--------------------------------------------------") print(f"Square root of {arr} = {arr_sqrt}")
Here's the result:
Or, if you prefer a more Pythonic approach, there's no reason not to use a list comprehension and simply the above calculation to a single line of code:
import math arr = [1, 25, 30.82] # Bonus #2.2 - Square root of a Python list using list comprehension arr_sqrt = [math.sqrt(num) for num in arr] # Print # Print print("Bonus #2.2 - Square root of a Python list using list comprehension") print("--------------------------------------------------") print(f"Square root of {arr} = {arr_sqrt}")
The output is identical:
And that's how easy it is to take square roots in Python – for integers, floats, lists, and even complex numbers. Let's make a short recap next.
You now know 5 different ways to calculate Python square roots. In practice, you only need one, but it can't hurt to know a couple of alternatives. You can use the built-in math
module, opt for numpy
, or use the exponent operator and avoid libraries altogether. All approaches work, and the choice is up to you.
Stay tuned to the blog if you want to learn more basic Python concepts. Thanks for reading!
In this tutorial we will explore how to extract images from PDF files using Python.
Table of Contents
Extracting text from PDF files is a very common task that’s often performed when working with different reports.
It’s a tedious task if you do it manually for every file using the available software and online tools.
In this tutorial we will explore how to extract images from PDF files using Python.
To continue following this tutorial we will need the following Python libraries: PyMuPDF and Pillow.
If you don’t have them installed, please open “Command Prompt” (on Windows) and install them using the following code:
pip install PyMuPDF pip install Pillow
Here is the PDF file we will use in this tutorial:
This PDF file will reside in the same folder as the main.py with our code.
We will also need to create an empty folder images to save the extracted images, so the folder structure should look like this:
Let’s start with importing the required dependencies:
#Import required dependencies import fitz import os from PIL import Image
Define the path to PDF file:
#Define path to PDF file file_path = 'sample_file.pdf'
Open the file using fitz module and extract all images information:
#Open PDF file pdf_file = fitz.open(file_path) #Calculate number of pages in PDF file page_nums = len(pdf_file) #Create empty list to store images information images_list = [] #Extract all images information from each page for page_num in range(page_nums): page_content = pdf_file[page_num] images_list.extend(page_content.get_images())
Now, let’s take a look at the images information we extracted:
print(images_list)
And you should get:
[(9, 0, 640, 491, 8, 'DeviceRGB', '', 'Image9', 'DCTDecode'), (10, 0, 640, 427, 8, 'DeviceRGB', '', 'Image10', 'DCTDecode'), (13, 0, 640, 427, 8, 'DeviceRGB', '', 'Image13', 'DCTDecode')]
where each tuple represents the following:
(xref, smask, width, height, bpc, colorspace, alt. colorspace, name, filter)
Now let’s add some error handling code in case the PDF file we work with has no images:
#Raise error if PDF has no images if len(images_list)==0: raise ValueError(f"No images found in {file_path}")
After we have extracted the images information from the PDF file, we can extract the actual images and save them on the computer:
#Save all the extracted images for i, image in enumerate(images_list, start=1): #Extract the image object number xref = image[0] #Extract image base_image = pdf_file.extract_image(xref) #Store image bytes image_bytes = base_image["image"] #Store image extension image_ext = base_image['ext'] #Generate image file name image_name = str(i) + '.' + image_ext #Save image with open(os.path.join(images_path, image_name) , "wb") as image_file: image_file.write(image_bytes) image_file.close()
After running the code, you should see the extracted images appear in the images folder:
#Import required dependencies import fitz import os from PIL import Image #Define path to PDF file file_path = 'sample_file.pdf' #Define path for saved images images_path = 'images/' #Open PDF file pdf_file = fitz.open(file_path) #Get the number of pages in PDF file page_nums = len(pdf_file) #Create empty list to store images information images_list = [] #Extract all images information from each page for page_num in range(page_nums): page_content = pdf_file[page_num] images_list.extend(page_content.get_images()) #Raise error if PDF has no images if len(images_list)==0: raise ValueError(f"No images found in {file_path}") #Save all the extracted images for i, img in enumerate(images_list, start=1): #Extract the image object number xref = img[0] #Extract image base_image = pdf_file.extract_image(xref) #Store image bytes image_bytes = base_image["image"] #Store image extension image_ext = base_image['ext'] #Generate image file name image_name = str(i) + '.' + image_ext #Save image with open(os.path.join(images_path, image_name) , "wb") as image_file: image_file.write(image_bytes) image_file.close()
In this article we explored how to extract images from PDF files using Python and PyMuPDF.
Feel free to leave comments below if you have any questions or have suggestions for some edits and check out more of my Python Programming tutorials.
The post Extract Images from PDF using Python appeared first on PyShark.
Want to check if a sentence is grammatically correct with Python? Try Gingerit – a free Python grammar checker.
If you want to implement a Python grammar checker and don't know where to start, you're at the right place. Python Gingerit module allows you to check and correct grammar free of charge, even without registering for a free account.
The module is based on Ginger Software, a grammar checker that allows you to eliminate writing pain points if English isn't your first language:
Today you'll learn how to use it in Python, and what are some of the issues and limitations you need to be aware of.
The package is available on PyPi, which makes installation a breeze. Just run the following command from the terminal:
pip install gingerit
You can now open up a Python notebook or a script and start using the module. The code snippet below should correct the grammatical errors in a short sentence:
from gingerit.gingerit import GingerIt text = 'This sentance contains a cuple of gramatical mistakes.' parser = GingerIt() parser.parse(text)
But instead, it results in a Gingerit JSONDecodeError with a message of Expecting value: line 1 column 1 (char 0):
Let's see how to fix this Gingerit JSON error next.
In addition to PyPi, the package is also available on GitHub. This means we can download the gingerit.py file and see what's causing the JSON error.
To fix the issue, we'll first need to install an additional Python package – CloudScraper. It's used to bypass Cloudflare's anti-bot page implemented with the Requests module:
pip install cloudscraper
Once installed, change the way the session is initialized on line 16:
# session = requests.Session() session = cloudscraper.create_scraper()
The contents of GingerIt
class (with library imports) should now look as follows:
import requests import cloudscraper URL = "..." # noqa API_KEY = "..." class GingerIt(object): def __init__(self): self.url = URL self.api_key = API_KEY self.api_version = "2.0" self.lang = "US" def parse(self, text, verify=True): session = cloudscraper.create_scraper() request = session.get( self.url, params={ "lang": self.lang, "apiKey": self.api_key, "clientVersion": self.api_version, "text": text, }, verify=verify, ) data = request.json() return self._process_data(text, data) ...
That's all we need to solve the Gingerit JSONDecodeError. From now on, we'll use the GingerIt
class instead of the previously installed library, so keep that in mind.
Let's run the same code snippet as before, but without the library import:
text = 'This sentance contains a cuple of gramatical mistakes.' parser = GingerIt() parser.parse(text)
There are no errors anymore. Instead, the following Python dictionary is returned:
The result dictionary contains the original text input, the corrected sentence, and a list of corrections. We can get a better grasp of the corrections by placing them in a Pandas DataFrame:
import pandas as pd pd.set_option('display.max_colwidth', None) result = parser.parse(text) pd.DataFrame(result['corrections'])
Here are the results:
Looks like Gingerit caught every grammatical error, but will that always be the case? Let's find out.
Want to further tweak the visuals of Pandas DataFrames? This article has you covered:
Onto a slightly more complex example now. This time, the text will be longer and will contain more sophisticated grammatical mistakes:
text = "This paragraf will contain some grammatical misstakes. \ Theyre her to see how well does the Ginger gramar checking softwre work when acesed from the Python API. \ Fingrs crosed everything work." parser = GingerIt() parser.parse(text)
Here's the output:
It's quite lengthy, so let's style it to get better insights into the corrections:
pd.DataFrame(parser.parse(text)['corrections'])
Overall, Gingerit did an excellent job. There's only one objection – it failed to correct work into works in the last sentence. It's not a major issue, just something to consider.
So, how far can you take Gingerit grammar Python API before running into errors and limitations? There's only one way to find out.
The following code snippet tries to parse a 512-character-long string:
text = "This paragraf will be around 500 characters long. It's here to test the limits of the Ginger \ gramar softwar when accessed through the Python API. Do you think thre'll be any isues? Potentialy yes, \ but we'l have to test and see. And how cool is the fact you can check grammar and get detailed insigt into \ erors from Python, without registering for a free account or even buying a subscription? Its amazing, but \ there are limitations, of course. Lets see if we can break it by passing in this realy long parahraph." parser = GingerIt() pd.DataFrame(parser.parse(text)['corrections'])
Here are the results:
As you can see, there are no issues when trying to parse and correct a 500-character-long paragraph. But how far can we take it?
Let's try 2000 characters:
long_text = "This paragraf will be around 2000 characters long. It's here to test the limits of the Ginger \ gramar softwar when accessed through the Python API. Do you think thre'll be any isues? Potentialy yes, \ but we'l have to test and see. And how cool is the fact you can check grammar and get detailed insigt into \ erors from Python, without registering for a free account or even buying a subscription? Its amazing, but \ there are limitations, of course. Lets see if we can break it by passing in this realy long parahraph. \ Second iteraion. It's here to test the limits of the Ginger \ gramar softwar when accessed through the Python API. Do you think thre'll be any isues? Potentialy yes, \ but we'l have to test and see. And how cool is the fact you can check grammar and get detailed insigt into \ erors from Python, without registering for a free account or even buying a subscription? Its amazing, but \ there are limitations, of course. Lets see if we can break it by passing in this realy long parahraph. \ Thrd iteration. It's here to test the limits of the Ginger \ gramar softwar when accessed through the Python API. Do you think thre'll be any isues? Potentialy yes, \ but we'l have to test and see. And how cool is the fact you can check grammar and get detailed insigt into \ erors from Python, without registering for a free account or even buying a subscription? Its amazing, but \ there are limitations, of course. Lets see if we can break it by passing in this realy long parahraph. \ Forth iteration. It's here to test the limits of the Ginger \ gramar softwar when accessed through the Python API. Do you think thre'll be any isues? Potentialy yes, \ but we'l have to test and see. And how cool is the fact you can check grammar and get detailed insigt into \ erors from Python, without registering for a free account or even buying a subscription? Its amazing, but \ there are limitations, of course. Lets see if we can break it by passing in this realy long parahraph." parser = GingerIt() result = parser.parse(long_text) pd.DataFrame(result['corrections'])
This time we get an error:
It's actually a well-known and discussed issue. The free API can't handle more than 600 characters at once, so you'll have to split your strings into smaller chunks. It's not that big of an issue, but the error message should be more informative.
We'll now go over some frequently asked questions regarding Python and Gingerit.
From Python and their website – yes. You can use Gingerit Python for free, with a limitation of 600 characters per API request. The only way to pass this limitation is to register for an account, which isn't free nor comes with a free trial.
There are many reasons why Gingerit Python might not work for you. We've gone over two reasons and solutions in this article:
You might also run into weekly rate limits, but this shouldn't happen right away.
It's a really simple fix – Just use the modified Python script instead of the PyPi package. We've gone through how to modify the script to bypass Cloudflare's anti-bot page earlier in the article, so refer to that section.
And that's how easy it is to implement a Python grammar checker for free. We went through the process of configuring Python Gingerit, and a couple of examples of different lengths and complexities.
There's some manual work and tweaks involved in getting past errors, but these are well worth it if you need a free grammar checker with explanations.
What are your thoughts on Gingerit? Do you use it as a free way to check if a sentence is grammatically correct in Python? Do you use some other alternative? Please let me know in the comment section below.
Elastic net regression has all the strengths of both ridge and lasso regression without the apparent weaknesses. As such this is a great algorithm for regularized regression. The video below explains how to use this algorithm with Python
A few weeks ago, I introduced a forecasting API (Application Programming Interface). The application can be found here:
https://techtonique2.herokuapp.com/
So far, as of 2022-11-23, this API contains four methods for univariate time
series forecasting (with prediction intervals):
mean
a (not so naïve) benchmark method, whose prediction is the sample mean.rw
a (not so naïve) benchmark method, whose prediction is the last value of the input time series.theta
is the forecasting method described in [1] and [2], which won the M3 competition. prophet
is a popular model described in [3].In this post, I’ll present two packages, one implemented in R and one in Python, which are designed for smoothing users’ interaction with the API. You can create similar high-level packages in other programming languages, by using this tool and this page.
Content
create_account
get_token
get_forecast
pip install forecastingapi
library(devtools) devtools::install_github("Techtonique/forecastingapi/R-package") library(forecastingAPI)
create_account
:import forecastingapi as fapi res_create_account = fapi.create_account(username="user1@example.com", password="pwd") # choose a better password print(res_create_account)
forecastingAPI::create_account(username = "user2@example.com", password = "pwd") # choose a better password
get_token
token = fapi.get_token(username = "user1@example.com", password = "pwd") print(token)
token <- forecastingAPI::get_token(username = "user2@example.com", password = "pwd")
The token is valid for 5 minutes. After 5 minutes, it must be renewed, using get_token
.
get_forecast
:path_to_file = '/Users/t/Documents/datasets/time_series/univariate/USAccDeaths.csv' # (examples:https://github.com/Techtonique/datasets/tree/main/time_series/univariate) res_get_forecast = fapi.get_forecast(file=path_to_file, token=token) print(res_get_forecast) res_get_forecast2 = fapi.get_forecast(file=path_to_file, token=token, start_training = 2, n_training = 7, h = 4, level = 90) print(res_get_forecast2) res_get_forecast3 = fapi.get_forecast(file=path_to_file, token=token, date_formatting="ms", start_training = 2, n_training = 7, h = 4, level = 90) print(res_get_forecast3) res_get_forecast4 = fapi.get_forecast(file=path_to_file, token=token, method = "prophet") print(res_get_forecast4)
path_to_file <- '/Users/t/Documents/datasets/time_series/univariate/USAccDeaths.csv' # (examples:https://github.com/Techtonique/datasets/tree/main/time_series/univariate) f_theta <- forecastingAPI::get_forecast(file = path_to_file, token = token, method = "theta", h=10, level = 95) f_mean <- forecastingAPI::get_forecast(file = path_to_file, token = token, method = "mean", h=10, level = 95) f_rw <- forecastingAPI::get_forecast(file = path_to_file, token = token, method = "rw", h=10, level = 95) f_prophet <- forecastingAPI::get_forecast(file = path_to_file, token = token, method = "prophet", h=10, level = 95)
[1] Assimakopoulos, V., & Nikolopoulos, K. (2000). The theta model: a decomposition approach to forecasting. International journal of forecasting, 16(4), 521-530.
[2] Hyndman, R. J., & Billah, B. (2003). Unmasking the Theta method. International Journal of Forecasting, 19(2), 287-290.
[3] Taylor, S. J., & Letham, B. (2018). Forecasting at scale. The American Statistician, 72(1), 37-45.
If you truly want to wrap your head around a deep learning model, visualizing it might be a good idea. These networks typically have dozens of layers, and figuring out what’s going on from the summary alone won’t get you far. That’s why today we’ll show you 3 ways to visualize Pytorch neural networks.
We’ll first build a simple feed-forward neural network model for the well-known Iris dataset. You’ll see that visualizing models/model architectures isn’t complicated at all, and will take you only a couple of lines of code.
Data for Good – How Appsilon Counted Nests of Shags with YOLO Object Detection Algorithm.
Table of contents:
Building a neural network model from scratch in PyTorch is easier than it sounds. Previous experience with the library is desirable, but not required – you’ll have no trouble following if you prefer some other deep learning package.
We’ll build a model around the Iris dataset for two reasons:
The code snippet below imports all Python libraries we’ll need for now and loads in the dataset:
import torch import torch.nn as nn import torch.nn.functional as F import pandas as pd iris = pd.read_csv("https://gist.githubusercontent.com/netj/8836201/raw/6f9306ad21398ea43cba4f7d537619d0e07d5ae3/iris.csv") iris.head()
Now, PyTorch can’t understand Pandas DataFrames, so we’ll have to convert the dataset into a tensor format.
The features of the dataset can be passed straight into the torch.tensor()
function, while the target variable requires some encoding (from string to integer):
X = torch.tensor(iris.drop("variety", axis=1).values, dtype=torch.float) y = torch.tensor( [0 if vty == "Setosa" else 1 if vty == "Versicolor" else 2 for vty in iris["variety"]], dtype=torch.long ) print(X[:3]) print() print(y[:3])
And that’s it. The dataset is ready to be passed into a PyTorch neural network model. Let’s build one next. It will have an input layer going from 4 features to 16 nodes, one hidden layer, and an output layer going from 16 nodes to 3 class probabilities:
class Net(nn.Module): def __init__(self): super().__init__() self.input = nn.Linear(in_features=4, out_features=16) self.hidden_1 = nn.Linear(in_features=16, out_features=16) self.output = nn.Linear(in_features=16, out_features=3) def forward(self, x): x = F.relu(self.input(x)) x = F.relu(self.hidden_1(x)) return self.output(x) model = Net() print(model)
It’s easy to look at the summary of this model since there are only a couple of layers, but imagine you had a deep network with dozens of layers – all of the sudden, the summary would be too large to fit the screen.
In the following section, we’ll explore the first way to visualize PyTorch neural networks, and that is with the Torchviz library.
Torchviz is a Python package used to create visualizations of PyTorch execution graphs and traces. It depends on Graphviz, which is a dependency you’ll have to install system-wide (Mac example shown below). Once installed, you can install Torchviz with pip:
brew install graphviz pip install torchviz
To use Torchviz in Python, you’ll have to import the make_dot()
function, make an instance of your neural network class, and calculate prediction probabilities of the entire training set or a batch of samples. Since the Iris dataset is small, we’ll calculate predictions for all flower instances:
from torchviz import make_dot model = Net() y = model(X)
That’s all you need to visualize the network. Simply pass the average of the probability tensor alongside the model parameters to the make_dot()
function:
make_dot(y.mean(), params=dict(model.named_parameters()))
You can also see what autograd saves for the backward pass by specifying two additional parameters: show_attrs=True
and show_saved=True
:
make_dot(y.mean(), params=dict(model.named_parameters()), show_attrs=True, show_saved=True)
It’s a bit more detailed graph, but maybe that’s what you’re aiming for.
Next, we’ll explore a Desktop app used to visualize any ONNX model.
Netron is a Desktop and Web interface for visualizing neural network models from different libraries, including PyTorch. It works best if you export the model into an ONNX format (Open Neural Network Exchange), which is as simple as a function call in PyTorch.
You can download the Desktop standalone application, or you can use a web interface linked in the documentation. There are also Python server options, but we haven’t explored them.
To get started, specify names for inputs and outputs as a list of string(s). Feel free to name these however you want. Once done, call the torch.onnx.export()
function to export the model to a file:
input_names = ["Iris"] output_names = ["Iris Species Prediction"] torch.onnx.export(model, X, "model.onnx", input_names=input_names, output_names=output_names)
The model is now saved to model.onnx
file, and you can easily load it into Netron. Here’s what it looks like:
Let’s explore another way to visualize PyTorch neural networks which Tensorflow users will find familiar.
TensorBoard is a visualization and tooling framework needed for machine learning experimentations. It has many features useful to deep learning researchers and practitioners, one of them being visualizing the model graph.
That’s exactly the feature we’ll explore today. But first, make sure to install TensorBoard through pip:
pip install tensorboard
So, how can you connect the PyTorch model with TensorBoard? You’ll need to take advantage of the SummaryWriter
class from PyTorch, and add a network graph to a log directory. In our example, the logs will be saved to the torchlogs/
folder:
from torch.utils.tensorboard import SummaryWriter writer = SummaryWriter("torchlogs/") model = Net() writer.add_graph(model, X) writer.close()
Once the network graph is saved, navigate to the log directory from the shell and launch TensorBoard:
cd <path-to-logs-dir> tensorboard --logdir=./
You’ll be able to see the model graph on http://localhost:6006
. You can click on any graph element and TensorBoard will expand it for you, as shown in the figure below:
And that’s it for the ways to visualize PyTorch neural networks. Let’s make a short recap next.
If you want to understand what’s going on in a neural network model, visualizing the network graph is the way to go. Sure, you need to actually understand why the network is constructed the way it is, but that’s a fundamental deep learning knowledge we assume you have.
Maximize the benefits of your ML projects with templates using PyTorch Lightning & Hydra.
We’ve explored three ways to visualize neural network models from PyTorch – with Torchviz, Netron, and TensorBoard. All are excellent, and there’s no way to pick a winner. Let us know which one you prefer.
Do you use some other tool to visualize neural network model graphs? Please let us know in the comment section below. Also, don’t hesitate to move the discussion to Twitter – @appsilon. We’d love to hear from you.
What are benefits of Model Serialization? Find out in our latest blog post by Piotr Storożenko.
The post How to Visualize PyTorch Neural Networks – 3 Examples in Python appeared first on Appsilon | Enterprise R Shiny Dashboards.
It has been 6 months since the launch of Diffify, our
website for comparing package releases. We are delighted to announce that, in
addition to CRAN’s 20,000 R packages, you can now track 16...
It has been 6 months since the launch of Diffify, our
website for comparing package releases. We are delighted to announce that, in
addition to CRAN’s 20,000 R packages, you can now track 1600 popular Python
packages!
The current criteria for a Python package to be included in Diffify are:
If your favourite package is not currently accessible, don’t worry! We are
actively working to expand the list to as many PyPI packages as possible, as
we’ll explain below.
Data comes in all shapes and sizes. It can often be difficult to know where to start. Whatever your problem, Jumping Rivers can help.
The first change you’ll notice is to our homepage, where
we now have buttons for both R and Python.
Clicking on the Python button will take you through to the package search bar.
For this walkthrough, we will compare versions 3.3.0 and 3.5.0 of the Matplotlib
package. Diffify provides a breakdown of the changes to the package
dependencies, functions and classes.
We consider three kinds of dependencies:
In our example, we see that the Python version requirement has changed from
>=3.6
to >=3.7
.
Here we provide a list of functions that have been added, removed or changed
between the two versions.
Clicking on the “Details” dropdown will bring up the function arguments,
including the argument name and default value. If type annotations are included
in the package source code, Diffify will also display the argument type and the
function return type.
For the pyplot.grid()
function, the name of the first positional argument has
changed from b
to visible
.
Here we provide a list of classes that have been added, removed or changed.
Clicking on the “Methods” button for a class will bring up a pop-up that lists
the methods that belong to that class. The example below shows the methods
.__init__()
and .from_dict()
, which belong to the spines.Spines
class.
Similar to functions, you can access the method arguments by clicking on
“Details”.
The functions and classes listed above have been detected by analysing the
package source code. We have taken various steps to filter out code that is
intended for internal use by the package developers, including
test*
and classes whose names startTest*
test_*
or end *_test.py
These criteria are intended to leave out internal code and unit tests.
Python has been around for quite a while, and consequently it has many
packages – 400,000 to be precise! Perhaps unsurprisingly, analysing so many
packages for Diffify has proven to be a bit of a challenge…
This is why we have initially chosen to focus on the 2000 most popular PyPI
packages. We will soon extend this to the top 5000, according to
Top PyPI Packages. And we won’t
be stopping there! It remains to be seen whether we will manage to add all
400,000, but we will certainly try our utmost.
Despite our best efforts to filter out clutter, you may still come across some
functions and classes that are clearly intended for internal use or unit
testing. We will continue to look at ways to improve our filters.
We hope you enjoy the new content! As always, if you spot any bugs or have any
suggestions please add an issue to our public
GitHub.
Stay tuned for more updates…
For updates and revisions to this article, see the original post
If you want to square a number in Python, well, you have options. There are numerous ways and approaches to Python squaring, and today we'll explore the four most common. You'll also learn how to square Python lists in three distinct ways, but more on that later.
Let's get started with the first Python squaring approach – by using the exponent operator (**).
Table of contents:
The asterisk operator in Python – **
– allows you to raise a number to any exponent. It's also used to unpack dictionaries, but that's a topic for another time.
On the left side of the operator, you have the number you want to raise to an exponent, and on the right side, you have the exponent itself. For example, if you want to square the number 10, you would write 10**2
– it's that easy.
Let's take a look at a couple of examples:
a = 5 b = 15 c = 8.65 d = -10 # Method #1 - The exponent operator (**) a_squared = a**2 b_squared = b**2 c_squared = c**2 d_squared = d**2 # Print print("Method #1 - The exponent operator (**)") print("--------------------------------------------------") print(f"{a} squared = {a_squared}") print(f"{b} squared = {b_squared}") print(f"{c} squared = {c_squared}") print(f"{d} squared = {d_squared}")
Below you'll see the output of the code cell:
And that's how you can square, or raise a number to the second power by using the asterisk operator.
But what if you want to change the exponent? Simply change the number on the right side of the operator:
print("Method #1 - The exponent operator (**) (2)") print("--------------------------------------------------") print(f"{a} to the power of 3 = {a**3}") print(f"{d} to the power of 5 = {d**5}")
Code output:
One down, three to go.
The math module is built into Python and packs excellent support for mathematical functions. One of these functions is pow()
, and it accepts two arguments:
x
– The number you want to square or raise to an exponent.y
– The exponent.Let's modify the code snippet from earlier to leverage the math
module instead:
import math a = 5 b = 15 c = 8.65 d = -10 # Method #2 - math.pow() function a_squared = math.pow(a, 2) b_squared = math.pow(b, 2) c_squared = math.pow(c, 2) d_squared = math.pow(d, 2) # Print print("Method #2 - math.pow() function") print("--------------------------------------------------") print(f"{a} squared = {a_squared}") print(f"{b} squared = {b_squared}") print(f"{c} squared = {c_squared}") print(f"{d} squared = {d_squared}")
Here's the output:
The output is nearly identical to what we had before, but the math
module converts everything to a floating point number, even if there's no need for it. Keep that in mind, as it's an additional casting step if you explicitly want integers.
As you would imagine, raising a number to any other exponent is as easy as changing the second argument value:
print("Method #2 - math.pow() function (2)") print("--------------------------------------------------") print(f"{a} to the power of 3 = {math.pow(a, 3)}") print(f"{d} to the power of 5 = {math.pow(d, 5)}")
Code output:
Let's take a look at another, more manual approach to Python squaring.
There's no one stopping you from implementing squaring in Python by multiplying the number with itself. However, this approach isn't scalable. It's fine if you want to simply square a number, but what if you want to raise the number to a power of ten?
Here's an example of how to square a number by multiplying it by itself:
a = 5 b = 15 c = 8.65 d = -10 # Method #3 - Multiplication a_squared = a * a b_squared = b * b c_squared = c * c d_squared = d * d # Print print("Method #3 - Multiplication") print("--------------------------------------------------") print(f"{a} squared = {a_squared}") print(f"{b} squared = {b_squared}") print(f"{c} squared = {c_squared}") print(f"{d} squared = {d_squared}")
The results are identical to what we had in the first example:
If you want to raise a number to some other exponent, this approach quickly falls short. You need to repeat the multiplication operation many times, which isn't convenient:
print("Method #3 - Multiplication (2)") print("--------------------------------------------------") print(f"{a} to the power of 3 = {a * a * a}") print(f"{d} to the power of 5 = {d * d * d * d * d}")
Code output:
The results are still correct, but they're prone to errors that wouldn't happen if you were using any other approach.
Python's Numpy library is a holy grail for data scientists. It allows for effortless work with N-dimensional arrays, but it can also handle scalars.
Numpy's square()
function will raise any number or an array to the power of two. Let's see how to apply it to our previous code snippet:
import numpy as np a = 5 b = 15 c = 8.65 d = -10 # Method #4 - Numpy a_squared = np.square(a) b_squared = np.square(b) c_squared = np.square(c) d_squared = np.square(d) # Print print("Method #4 - Numpy") print("--------------------------------------------------") print(f"{a} squared = {a_squared}") print(f"{b} squared = {b_squared}") print(f"{c} squared = {c_squared}") print(f"{d} squared = {d_squared}")
The results are displayed below:
The one limitation of the square()
function is that it only raises a number/array to the power of two. If you need a different exponent, use the power()
function instead:
print("Method #4 - Numpy (2)") print("--------------------------------------------------") print(f"{a} to the power of 3 = {np.power(a, 3)}") print(f"{d} to the power of 5 = {np.power(d, 5)}")
Code output.
And that does it for squaring Python numbers. Let's see how to do the same to Python lists next.
As a data scientist, you'll spend a lot of time working with N-dimensional arrays. Knowing how to apply different operations to them, such as squaring each array item is both practical and time-saving. This section will show you three ways to square a Python list.
The first, and the most inefficient one is looping. We have two Python lists, the first one stores the numbers, and the second will store the squared numbers. We then iterate over the first list, square each number, and append it to the second one.
Here's the code:
arr = [5, 15, 8.65, -10] squared = [] # Method 1 - Looping for num in arr: squared.append(num**2) # Print print("Method #1 - Looping") print("--------------------------------------------------") print(squared)
And here's the output:
Iterating over an array one item at a time isn't efficient. There are more convenient and practical approaches, such as list comprehension.
With list comprehensions, you declare a second list as a result of some operation applied element-wise on the first one. Here we want to square each item, but the possibilities are endless.
Take a look at the following code snippet:
arr = [5, 15, 8.65, -10] # Method 2 - List comprehensions squared = [num**2 for num in arr] # Print print("Method #2 - List comprehensions") print("--------------------------------------------------") print(squared)
The results are identical, but now take one line of code less:
Let's switch gears and discuss you can square an array in Numpy.
Remember the square()
function from the previous section? You can also use it to square individual array items. Numpy automatically infers if a single number or an array has been passed as an argument:
arr = np.array([5, 15, 8.65, -10]) # Method 3 - Numpy squared = np.square(arr) # Print print("Method #3 - Numpy") print("--------------------------------------------------") print(squared)
Here are the results:
The Numpy array elements now have specific types – numpy.float64
– so that's why you see somewhat different formatting when the array is printed.
And that's how easy it is to square a number or a list of numbers in Python. Let's make a short recap next.
It's almost impossible to take a beginner's programming challenge without being asked to write a program that squares an integer and prints the result.
Now you know multiple approaches to squaring any type of number, and even arrays in Python. You've also learned how to raise a number to any exponent, and why some approaches work better than others.
Stay tuned to the blog if you want to learn the opposite operation – square roots – and what options you have in Python programming language.
Matplotlib is one of the longest standing and most comprehensive plotting libraries for Python.
It is mostly used for creating static plots and its flexible customisation options
make it a great c...
Matplotlib is one of the longest standing and most comprehensive plotting libraries for Python.
It is mostly used for creating static plots and its flexible customisation options
make it a great choice for creating publication quality graphs.
In this blog post we will look at formatting and colourmap customisation in Matplotlib,
and how to set a consistent plotting style throughout a project.
Note: If you wish to run the code snippets in this blog yourself you will need:
In Matplotlib it is possible to change styling settings globally with
runtime configuration (rc) parameters.
The default Matplotlib styling configuration is set with matplotlib.rcParams
.
This is a dictionary containing formatting settings and their values.
By changing these values we can change default settings throughout an
entire script or notebook.
For example, if you wanted to set the tick label size to 12pt you would use:
import matplotlib as mpl mpl.rcParams["xtick.labelsize"] = 12 mpl.rcParams["ytick.labelsize"] = 12
If you are making a plot for publication, a useful thing to do is enable LaTeX and set the font to be consistent with your LaTeX document. This can be done with:
mpl.rcParams["text.usetex"] = True mpl.rcParams["font.family"] = "Computer Modern Serif"
The full list of rcParams which you can configure can be found here.
If you want to later revert to the default settings for Matplotlib you can do this with:
mpl.rcdefaults()
Data comes in all shapes and sizes. It can often be difficult to know where to start. Whatever your problem, Jumping Rivers can help.
Matplotlib comes with a selection of available style sheets. These define a range of plotting parameters and can be used to apply those parameters to your plots.
After importing Matplotlib, you can see a list of available style sheets with:
import matplotlib.pyplot as plt plt.style.available
We will use the plot below as an example.
This was created with the default Matplotlib theme.
We can change the style of the figure,
as well as the rest of the figures throughout a script/ notebook, with:
plt.style.use("dark_background")
If we now look at our example again, we can see that the formatting has changed.
If you like the default look of plots in the {ggplot2} R package, there is also a style sheet for that,
plt.style.use("ggplot")
If you are writing a paper or a report you may want to define your own set of plotting parameters to be used throughout. You may also want to be able to use these parameters in several scripts and be able to share them with collaborators to ensure a consistent aesthetic. You can do this by creating your own style sheet.
The inbuilt style sheets are defined in .mplstyle
files. You can find out where these are located by running
import os os.path.join(mpl.get_data_path(), "stylelib")
If you are using miniconda the path returned will look something like ~/miniconda3/lib/pyhton3.8/site-packages/mpl-data/stylelib
.
In the stylelib/
folder you will find all the inbuilt .mplstyle
files.
Taking a look at the ggplot.mplstyle
file,
# from https://everyhue.me/posts/sane-color-scheme-for-matplotlib/ patch.linewidth: 0.5 patch.facecolor: 348ABD # blue patch.edgecolor: EEEEEE patch.antialiased: True font.size: 10.0 axes.facecolor: E5E5E5 axes.edgecolor: white axes.linewidth: 1 axes.grid: True axes.titlesize: x-large axes.labelsize: large axes.labelcolor: 555555 axes.axisbelow: True # grid/ticks are below elements (e.g., lines, text) axes.prop_cycle: cycler('color', ['E24A33', '348ABD', '988ED5', '777777', 'FBC15E', '8EBA42', 'FFB5B8']) # E24A33 : red # 348ABD : blue # 988ED5 : purple # 777777 : gray # FBC15E : yellow # 8EBA42 : green # FFB5B8 : pink xtick.color: 555555 xtick.direction: out ytick.color: 555555 ytick.direction: out grid.color: white grid.linestyle: - # solid line figure.facecolor: white figure.edgecolor: 0.50
we can see that it contains a collection of rcParam settings.
Creating your own style sheet is very straightforward. Simply create a file with a .mplstyle
extension, then put all your rcParam settings in here with the same format as the file shown above. If you save this file in stylelib/
you will be able to use your style sheet in your Python script with:
plt.style.use("style_sheet_name")
If you save your style sheet elsewhere you will need to specify the full or relative path,
plt.style.use("path_to_style_sheet/style_sheet_name.mplstyle")
Matplotlib has a variety of inbuilt colourmaps
to choose from.
If those colours don’t take your fancy then there are also external libraries that provide additional colourmaps.
A popular one is palettable, which includes
a series of colourmaps generated from Wes Anderson movies.
If you are feeling creative, or if you want the colours of your plots to match a particular theme or company branding,
then you can also create your own colourmap.
A colourmap object takes a number between 0 and 1 and maps this to a colour.
In Matplotlib, there are two colourmap classes: ListedColormap
and LinearSegmentedColormap
.
The colours for a ListedColormap
are stored in a .colors
attribute. We can take a look at the .colors
attribute of the inbuilt
“viridis” colourmap with:
# Sample 5 values from map viridis = mpl.colormaps["viridis"].resampled(5) print(viridis.colors)
## [[0.267004 0.004874 0.329415 1. ] ## [0.229739 0.322361 0.545706 1. ] ## [0.127568 0.566949 0.550556 1. ] ## [0.369214 0.788888 0.382914 1. ] ## [0.993248 0.906157 0.143936 1. ]]
This is a 5 x 4 array of RGBA values (as we sampled 5 values from the full map).
To create a discrete colourmap we can simply pass a list of
colours to ListedColormap
. These can be given as
named Matplotlib colours,
or as hex values.
from matplotlib.colors import ListedColormap discrete_cmap = ListedColormap(["#12a79d", "#293d9b", "#4898a8", "#40b93c"])
To look at this colourmap we will use the following code.
This plots a colourbar on its own.
def plot_cmap(cmap): fig, cax = plt.subplots(figsize=(8, 1)) cb1 = mpl.colorbar.Colorbar(cax, cmap=cmap, orientation="horizontal") plt.tight_layout() plt.show()
plot_cmap(discrete_cmap)
We can also specify the number of colours we want in the colourmap with the argument, N
.
If N
is greater than the length of the list provided then the colours are repeated,
otherwise the map is truncated at N
.
discrete_cmap = ListedColormap( ["#12a79d", "#293d9b", "#4898a8", "#40b93c"], N=8 ) plot_cmap(discrete_cmap)
As well as using named/hex colours, we can also create a colourmap by passing an N x 3 or
N x 4 array of RGB or RGBA values to ListedColormap
.
To create a similar colourmap to above this would be:
import numpy as np carray = np.array([ [18, 167, 157], [41, 61, 155], [72, 152, 168], [64, 185, 60] ]) / 255 discrete_cmap = ListedColormap(carray) plot_cmap(discrete_cmap)
Note that here the RGB values were originally on a scale of 0–255.
However, Matplotlib expects a scale of 0–1. Hence the division of our array by 255.
To create a continuous colourmap we need an array of gradually changing colours.
This can be achieved using np.linspace(start, stop, num)
. For example, to generate a fading colourmap,
we can use an RGB value from above as the start point, and white (1) as the endpoint.
N = 100 # No. of colours (large enough to appear continuous) # Create N x 3 array of ones carray = np.ones((N, 3)) # Assign columns of array carray[:, 0] = np.linspace(72 / 255, 1, N) carray[:, 1] = np.linspace(152 / 255, 1, N) carray[:, 2] = np.linspace(168 / 255, 1, N) # Create colourmap cont_cmap = ListedColormap(carray) plot_cmap(cont_cmap)
LinearSegmentedColormap
s do not have a .colors
attribute. However, we can access the values in the colourmap by calling it with an array of integers.
cool = mpl.colormaps["cool"].resampled(8) cool(range(8))
## array([[0. , 1. , 1. , 1. ], ## [0.14285714, 0.85714286, 1. , 1. ], ## [0.28571429, 0.71428571, 1. , 1. ], ## [0.42857143, 0.57142857, 1. , 1. ], ## [0.57142857, 0.42857143, 1. , 1. ], ## [0.71428571, 0.28571429, 1. , 1. ], ## [0.85714286, 0.14285714, 1. , 1. ], ## [1. , 0. , 1. , 1. ]])
Rather than taking a list of colours that make up the map, LinearSegmentedColormap
s take
an argument called segmentdata
. This argument is a dictionary with the keys “red”, “green”
and “blue”. Each value in the dictionary is a list of tuples.
These tuples specify colour values before and after points in the colourmap as (i
, y[i-1]
, y[i+1]
). Here i
is
a point on the map, y[i-1]
is the colour value of the point before i
, and y[i+1]
is the colour value
after i
. The other colour values on the map are obtained by performing linear interpolation between
these specified anchor points.
For example, we could have the following segmentdata
dictionary:
cdict = { "red": [ (0, 0, 0), # start off with r=0 (0.25, 1, 0), # r increases from 0-1 bewteen 0-0.25, then drops to 0 (1, 0, 0), # end with r=0 ], "green": [ (0, 0, 0), # start off with g=0 (0.25, 0, 0), # at 0.25, g is still 0 (0.75, 1, 0), # g increases from 0-1 between 0.25-0.75, then drops to 0 (1, 0, 0), # g is 0 between 0.75 and 1 ], "blue": [ (0, 0, 0), # start off with b=0 (0.75, 0, 0), # b is 0 between 0 and 0.75 (1, 1, 1), # b increases from 0 to 1 between points 0.75 and 1 ], }
In this map,
LinearSegmentedColormap
also takes a name argument. We can create a map from the dictionary
above with:
from matplotlib.colors import LinearSegmentedColormap seg_cmap = LinearSegmentedColormap("seg_cmap", cdict) plot_cmap(seg_cmap)
This way of creating a colourmap is a bit longwinded. Luckily, there is an easier
way to create a LinearSegmentedColormap
using the .from_list()
method. This takes a
list of colours to be used as equally spaced anchor points.
color_list = ["#12a79d", "#293d9b", "#4898a8", "#40b93c"] seg_cmap = LinearSegmentedColormap.from_list("mymap", color_list) plot_cmap(seg_cmap)
In this blog we have covered the basics of how you can format plots and create colourmaps in
Matplotlib. Once we can do this there is a lot more to be said on how to choose these
settings to create clear and accessible plots, but we will leave that for a future post.
For updates and revisions to this article, see the original post