Determining Distance to Coastline for Policy Locations Using GeoPandas

This article was first published on The Pleasure of Finding Things Out: A blog by James Triveri , and kindly contributed to python-bloggers. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

Knowing the distance to coastline for an exposure is crucial for insurance rating applications because it helps insurers assess the risk of hazards like hurricanes, storm surges, and flooding, which are much more prevalent in coastal areas. This information allows insurers to make informed decisions about pricing, underwriting and reinsurance. Properties closer to the coast are generally at higher risk, leading to higher premiums for these properties. Insurance rating plans may use distance to coastline directly as an explanatory variable, with factors inversely proportional to distance to coastline.

This article walks through how GeoPandas can be used to calculate distance to coastline for a collection of simulated latitude-longitude pairs in the Florida region, and how these exposure locations can be assigned to different risk levels based on the distance calculation.

Coastal Shapefiles

The United States Census Bureau provides shapefiles for state, county and ZCTA boundaries as well as roads, rails an coastlines (see full list here). Shapefiles are a widely-used geospatial vector data format that store the geometric location and attribute information of geographic features, which can be represented as points, lines, or polygons.

We being by downloading the COASTLINE zip archive available on the Census Bureau’s FTP site. The COASTLINE shapefile is loaded into GeoPandas (the STATE shapefile is also loaded for later use). We limit our analysis to the continental United States and filter out the Great Lakes. Inspecting the first few records:

import numpy as np
import pandas as pd
import geopandas as gpd

np.set_printoptions(suppress=True, precision=5)
pd.options.mode.chained_assignment = None
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)


coastline_shp = "tl_2024_us_coastline.zip"
us_shp = "tl_2024_us_state.zip"


# Bounding box of lower 48 states. Remove Great Lakes.
xmin, ymin, xmax, ymax = -125, 24.6, -65, 50
coast = gpd.read_file(coastline_shp)
coast = coast.cx[xmin:xmax, ymin:ymax]
coast = coast[coast.NAME!="Great Lakes"].reset_index(drop=True)

# State boundaries.
states = gpd.read_file(us_shp)[["NAME", "geometry"]]
states = states.cx[xmin:xmax, ymin:ymax].reset_index(drop=True)

print(f"coast.shape : {coast.shape}")
print(f"states.shape: {states.shape}")

coast.head(10)
coast.shape : (2916, 3)
states.shape: (49, 2)
NAMEMTFCCgeometry
0AtlanticL4150LINESTRING (-80.88368 32.03912, -80.88365 32.0…
1AtlanticL4150LINESTRING (-70.66800 41.51199, -70.65663 41.5…
2AtlanticL4150LINESTRING (-76.58108 38.09572, -76.58184 38.0…
3AtlanticL4150LINESTRING (-73.75518 40.58565, -73.75517 40.5…
4AtlanticL4150LINESTRING (-76.15615 38.63324, -76.15070 38.6…
5AtlanticL4150LINESTRING (-76.53289 39.20776, -76.53298 39.2…
6AtlanticL4150LINESTRING (-73.93653 40.56644, -73.93594 40.5…
7AtlanticL4150LINESTRING (-81.10208 29.42706, -81.10215 29.4…
8AtlanticL4150LINESTRING (-71.89236 41.32922, -71.89293 41.3…
9AtlanticL4150LINESTRING (-75.31239 38.94595, -75.31239 38.9…

The coastline shapefile is comprised of ~3,000 LINESTRING objects. Let’s get a count of geometries by NAME:

coast["NAME"].value_counts().sort_index()
NAME
Atlantic     941
Gulf         647
Pacific     1328
Name: count, dtype: int64

We can visualize the coastline by calling the coast GeoDataFrame’s plot method:

import matplotlib.pyplot as plt

fig, ax = plt.subplots(1, 1, figsize=(8, 6), tight_layout=True)
ax.set_title("Lower 48 Coastline", fontsize=9)
coast.plot(ax=ax, edgecolor="red", linewidth=1.0)
ax.axis("off")
plt.show()

To overlay the coastline along with state boundaries, download the STATE shapefile from the Census Bureau’s FTP site and plot them together:

fig, ax = plt.subplots(1, 1, figsize=(8, 6), tight_layout=True)
ax.set_title("Lower 48 States with Coastline", fontsize=9)
coast.plot(ax=ax, edgecolor="red", linewidth=1.50, linestyle="--")
states.boundary.plot(ax=ax, edgecolor="black", linewidth=0.50)
ax.axis("off")
plt.show()

Let’s next generate synthetic latitude-longitude pairs from within the Florida bounding envelope. The envelope bounds can be obtained from Florida’s geometry as follows:


# Get bounding box for each state.
states["bbox"] = states.geometry.map(lambda gg: gg.envelope.bounds)

# Put coordinates in separate columns.
states[["lon0", "lat0", "lon1", "lat1"]] = pd.DataFrame(states.bbox.tolist(), index=states.index)

states.head()
NAMEgeometrybboxlon0lat0lon1lat1
0West VirginiaPOLYGON ((-77.75438 39.33346, -77.75422 39.333…(-82.644591, 37.20154, -77.719519, 40.638801)-82.64459137.201540-77.71951940.638801
1FloridaMULTIPOLYGON (((-83.10874 24.62949, -83.10711 …(-87.634896, 24.396308, -79.974306, 31.000968)-87.63489624.396308-79.97430631.000968
2IllinoisPOLYGON ((-87.89243 38.28285, -87.89334 38.282…(-91.513079, 36.970298, -87.019935, 42.508481)-91.51307936.970298-87.01993542.508481
3MinnesotaPOLYGON ((-95.31991 48.99892, -95.31778 48.998…(-97.239093, 43.499361, -89.483385, 49.384479)-97.23909343.499361-89.48338549.384479
4MarylandPOLYGON ((-75.75600 39.24607, -75.75579 39.243…(-79.487651, 37.886605, -74.986282, 39.723037)-79.48765137.886605-74.98628239.723037

Let’s draw the bounding region using folium:

import folium 

# Florida bounding box. 
lon0, lat0, lon1, lat1 = states[states.NAME=="Florida"].bbox.item()

mlat, mlon = (lat0 + lat1) / 2, (lon0 + lon1) / 2

m = folium.Map(
    location=[mlat, mlon], 
    zoom_start=6, 
    zoom_control=True, 
    no_touch=True,
    tiles="OpenStreetMap"
    )

folium.Rectangle(
    [(lat0, lon0), (lat1, lon1)], 
    fill_color="blue", fill_opacity=.05
    ).add_to(m)

m
Make this Notebook Trusted to load map: File -> Trust Notebook

Sampling from the bounding region highlighted above will result in many points in the Gulf. Let’s narrow the sampling space:

lon0, lat0, lon1, lat1 = (-80.5, 26, -81.75, 28)

mlat, mlon = (lat0 + lat1) / 2, (lon0 + lon1) / 2

m = folium.Map(
    location=[mlat, mlon], 
    zoom_start=7, 
    zoom_control=True, 
    no_touch=True,
    tiles="OpenStreetMap"
    )

folium.Rectangle(
    [(lat0, lon0), (lat1, lon1)], 
    fill_color="blue", fill_opacity=.05
    ).add_to(m)

m
Make this Notebook Trusted to load map: File -> Trust Notebook

Next the points list needs to be represented as a GeoDataFrame, using the generated points as the geometry. We set "crs=EPSG:4326" representing longitude-latitude pairs. A policy_id is included as an identifier for each point.

dfpoints = pd.DataFrame({
    "policy_id": [str(ii).zfill(7) for ii in range(len(points))],
    "lat": rlats, 
    "lon": rlons, 
})

# Create GeoDataFrame.
points = gpd.GeoDataFrame(
    dfpoints,
    geometry=gpd.points_from_xy(dfpoints.lon, dfpoints.lat),
    crs="EPSG:4326"
)

points.head(10)
policy_idlatlongeometry
0000000026.158425-81.697845POINT (-81.69785 26.15842)
1000000127.308367-80.743510POINT (-80.74351 27.30837)
2000000226.050160-81.226294POINT (-81.22629 26.05016)
3000000326.896432-80.682348POINT (-80.68235 26.89643)
4000000427.407736-81.556209POINT (-81.55621 27.40774)
5000000526.551877-80.902803POINT (-80.90280 26.55188)
6000000626.301071-81.521000POINT (-81.52100 26.30107)
7000000727.435718-81.499145POINT (-81.49914 27.43572)
8000000827.784378-80.590765POINT (-80.59076 27.78438)
9000000927.635399-81.389836POINT (-81.38984 27.63540)

With both the coastline shapefile and point data represented as GeoDataFrames, we execute the sjoin_nearest spatial join to get the distance from each point to the nearest coastline. First we need to set the crs to a projected coordinate system so the distances are returned in units of meters instead of degrees. Projected coordinate systems use linear units like meters or feet, which makes it easier to perform precise spatial measurements. Here we opt for the Conus Albers equal area conic projection (EPSG:5069).

In the call to sjoin_nearest, we specify “meters” for the distance_col argument. This column will hold the distance to the coastline for each point in points in units of meters. A miles column is added after the join.

# Convert from GPS to  Conus Albers. 
points = points.to_crs("EPSG:5069")
coast = coast.to_crs("EPSG:5069")

# Perform spatial join. Covert meters to miles. 
gdf = gpd.sjoin_nearest(points, coast, how="left", distance_col="meters")
gdf["miles"] = gdf["meters"] * 0.000621371

# Get min, max and average distance to coast line.
min_dist = gdf.miles.min()
max_dist = gdf.miles.max()
avg_dist = gdf.miles.mean()

print(f"min. distance to coastline: {min_dist}")
print(f"max. distance to coastline: {max_dist}")
print(f"avg. distance to coastline: {avg_dist}")
min. distance to coastline: 0.20197535859716287
max. distance to coastline: 65.35183791451661
avg. distance to coastline: 35.26215239057489

Rate Group Based on Distance to Coastline

Let’s imagine a hypothetical rating plan that uses the following distances from the coastline to determine rates:

  • 0 – 5 miles: very high risk
  • 5 – 25 miles: high risk
  • 25 – 50 miles: medium risk
  • greater than 50 miles: low risk

A rolling join via merge_asof can be used to select the last row in the right DataFrame (group thresholds) whose on key is less than or equal to gdf’s key, which will be “miles” in both DataFrames.

# Create dfgroups DataFrame. 
dfgroups = pd.DataFrame({
    "risk": ["very high", "high", "medium", "low"],
    "miles": [0., 5., 25., 50.]
})

# Assign risk group to each policy location.
gdf = gdf.sort_values("miles", ascending=True)
gdf = pd.merge_asof(gdf, dfgroups, on="miles", direction="backward")

gdf.head(10)
policy_idlatlongeometryindex_rightNAMEMTFCCmetersmilesrisk
0000001627.972984-80.506574POINT (1523126.311 669355.565)176AtlanticL4150325.0479320.201975very high
1000000026.158425-81.697845POINT (1436901.833 453402.683)2813GulfL415011201.7678816.960454high
2000001526.383000-81.709033POINT (1432089.414 477631.030)2817GulfL415014922.8944319.272654high
3000004526.189330-81.661985POINT (1439966.962 457301.849)2813GulfL415015292.7891259.502496high
4000000827.784378-80.590765POINT (1518283.776 647462.840)211AtlanticL415016412.30785610.198132high
5000003726.366454-81.662935POINT (1436947.116 476528.844)2813GulfL415018665.87113011.598431high
6000003027.473270-80.505108POINT (1532219.823 614970.878)10AtlanticL415020055.34430812.461809high
7000001726.086354-81.489777POINT (1458848.059 448748.018)2813GulfL415024280.86275815.087424high
8000004427.226453-80.508471POINT (1536305.431 588071.282)10AtlanticL415029966.03203418.620023high
9000000626.301071-81.521000POINT (1452153.395 471583.300)2813GulfL415030913.53189619.208772high

Counting the number of policies per risk group:

gdf.risk.value_counts().sort_index()
risk
high         12
low          10
medium       27
very high     1
Name: count, dtype: int64

Finally, we can assign each risk a different color and visualize the resulting risk groups with folium:

# Colors for each group.
dcolors = {
    "very high": "#f40002",
    "high": "#ff721f",
    "medium": "#fafc15",
    "low": "#268a6d"
}

gdf["color"] = gdf["risk"].map(dcolors)

m = folium.Map(location=[mlat, mlon], zoom_start=8)

for tt in gdf.itertuples():
    lat, lon, color = tt.lat, tt.lon, tt.color
    folium.CircleMarker(
        location=[lat, lon],
        radius=6, 
        color=color, 
        fill_color=color, 
        fill=True,
        fill_opacity=1
        ).add_to(m)

m
Make this Notebook Trusted to load map: File -> Trust Notebook `); // facebook page document_write_mobile_aware(`

Sponsors

Archives