Python-bloggers

Historical Weather Data

This article was first published on Python - datawookie , and kindly contributed to python-bloggers. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

I’m building a model which requires historical weather data from a selection of locations in South Africa. In this post I demonstrate the process of acquiring the data and doing some simple processing.

I need data for three locations: Brookes and Goje (in KwaZulu-Natal) and Hlangalane (in the Eastern Cape).

# A tibble: 3 × 4
  name       region          lat   lon
  <chr>      <chr>         <dbl> <dbl>
1 Brookes    KwaZulu-Natal -29.6  29.8
2 Goje       KwaZulu-Natal -28.3  31.2
3 Hlangalane Eastern Cape  -31.0  28.6

Here are those locations on a map. They are sufficiently far apart that we would expect them to have different weather histories.

Data Acquisition

I’m getting the data using Weather API. The business plan gives me access to data going back to the beginning of 2010. I like to mix things up, so I’ll hit the API from Python and then use R to do the processing.

The API key is stored in an environment variable.

import os

API_KEY = os.getenv("WEATHER_API_KEY")

Define the date range.

import pandas as pd

DATE_MIN = "2020-08-01"
DATE_MAX = "2022-08-01"

DATES = pd.date_range(start=DATE_MIN, end=DATE_MAX)

Create a function for retrieving the data and writing it to a file. There will be one JSON file per location and date.

import re
import requests

def weather_history(name, region):
location = name+", "+region
slug = re.sub("[, ]+", "-", location.lower())

for date in DATES:
date = date.date()

URL = f"http://api.weatherapi.com/v1/history.json?key={API_KEY}&q={location}&dt={date}"

response = requests.get(URL)

with open(f"{date}-{slug}.json", "wt") as fid:
fid.write(response.text)

time.sleep(5)

Now retrieve the data.

weather_history("Goje", "KwaZulu-Natal")

Repeat for the other locations.

Data Processing

We’ll need a function for loading the JSON data into R. The data are nested, so we’ll include some code to unwrap and rectangle the data.

library(jsonlite)

prepare_weather <- function(path) {
  weather <- read_json(path)
  
  weather$location %>%
    as_tibble() %>%
    # Drop time fields that relate to data acquisition (download) time.
    select(-starts_with("localtime")) %>%
    mutate(
      hours = weather$forecast$forecastday %>%
        map_dfr(function(day) {
          map_dfr(day$hour, function(hour) {
            hour$condition <- NULL
            hour
          })
        }) %>%
        select(-ends_with("epoch")) %>%
        select(-matches("_(mph|f|in|miles)$")) %>%
        select(-matches("^(will_it|chance_of)_")) %>%
        list()
    )
}

Let’s read the data for Goje on 1 August 2021.

(goje <- prepare_weather("2021-08-01-goje-kwazulu-natal.json"))
# A tibble: 1 × 7
  name  region        country        lat   lon tz_id               hours   
  <chr> <chr>         <chr>        <dbl> <dbl> <chr>               <list>  
1 Goje  KwaZulu-Natal South Africa -28.3  31.2 Africa/Johannesburg <tibble>

The hours list column contains the hourly weather data. The data contains the following fields:

Let’s take a quick look. We’ll only pull out a few columns that are relevant to the model.

goje %>%
  unnest(cols = hours) %>%
  # Use appropriate time zone when converting to date/time type.
  mutate(time = as.POSIXct(time, "%Y-%m-%d %H:%M", tz = unique(tz_id))) %>%
  select(time, temp_c, wind_kph, wind_dir, pressure_mb, precip_mm, humidity, cloud)
# A tibble: 24 × 8
   time                temp_c wind_kph wind_dir pressure…¹ preci…² humid…³ cloud
   <dttm>               <dbl>    <dbl> <chr>         <dbl>   <dbl>   <int> <int>
 1 2021-08-01 00:00:00   16.6     17.3 NNE            1026       0      78     0
 2 2021-08-01 01:00:00   16.2     16.7 NNE            1025       0      76     0
 3 2021-08-01 02:00:00   15.9     16.1 NNE            1025       0      74     0
 4 2021-08-01 03:00:00   15.5     15.5 N              1024       0      71     0
 5 2021-08-01 04:00:00   15.6     15   N              1024       0      68     1
 6 2021-08-01 05:00:00   15.6     14.5 N              1024       0      64     2
 7 2021-08-01 06:00:00   15.7     14   N              1023       0      61     2
 8 2021-08-01 07:00:00   16.9     13.3 N              1023       0      55     5
 9 2021-08-01 08:00:00   18.2     12.6 N              1023       0      50     7
10 2021-08-01 09:00:00   19.4     11.9 N              1023       0      45     9
11 2021-08-01 10:00:00   21.4     11.5 NNE            1023       0      42     9
12 2021-08-01 11:00:00   23.5     11.2 NNE            1022       0      38     9
13 2021-08-01 12:00:00   25.5     10.8 NE             1021       0      35     8
14 2021-08-01 13:00:00   25.6     11.8 NE             1020       0      38     6
15 2021-08-01 14:00:00   25.6     12.7 NE             1019       0      40     3
16 2021-08-01 15:00:00   25.7     13.7 ENE            1018       0      43     0
17 2021-08-01 16:00:00   24.5     13.8 ENE            1018       0      48     0
18 2021-08-01 17:00:00   23.3     13.9 NE             1018       0      53     0
19 2021-08-01 18:00:00   22.1     14   NE             1018       0      58     0
20 2021-08-01 19:00:00   21.1     12   ENE            1019       0      60     0
21 2021-08-01 20:00:00   20.1     10   E              1019       0      61     0
22 2021-08-01 21:00:00   19.1      7.9 ESE            1020       0      63     0
23 2021-08-01 22:00:00   19.2      8.8 SSE            1021       0      64     0
24 2021-08-01 23:00:00   19.2      9.6 SSW            1021       0      65     0
# … with abbreviated variable names ¹​pressure_mb, ²​precip_mm, ³​humidity

We’ll wrap up with a few plots of daily aggregated data. First the total daily precipitation.

Looks like a wet year followed by a dry year. Finally the daily temperature (average is solid line and ribbon gives range).

These data are going to be particularly useful for our models.

To leave a comment for the author, please follow the link and comment on their blog: Python - datawookie .

Want to share your content on python-bloggers? click here.
Exit mobile version