Determining Distance to Coastline for Policy Locations Using GeoPandas

Posted on October 14, 2024 by The Pleasure of Finding Things Out: A blog by James Triveri in Data science | 0 Comments

This article was first published on The Pleasure of Finding Things Out: A blog by James Triveri , and kindly contributed to python-bloggers. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

Knowing the distance to coastline for an exposure is crucial for insurance rating applications because it helps insurers assess the risk of hazards like hurricanes, storm surges, and flooding, which are much more prevalent in coastal areas. This information allows insurers to make informed decisions about pricing, underwriting and reinsurance. Properties closer to the coast are generally at higher risk, leading to higher premiums for these properties. Insurance rating plans may use distance to coastline directly as an explanatory variable, with factors inversely proportional to distance to coastline.

This article walks through how GeoPandas can be used to calculate distance to coastline for a collection of simulated latitude-longitude pairs in the Florida region, and how these exposure locations can be assigned to different risk levels based on the distance calculation.

Coastal Shapefiles

The United States Census Bureau provides shapefiles for state, county and ZCTA boundaries as well as roads, rails an coastlines (see full list here). Shapefiles are a widely-used geospatial vector data format that store the geometric location and attribute information of geographic features, which can be represented as points, lines, or polygons.

We being by downloading the COASTLINE zip archive available on the Census Bureau’s FTP site. The COASTLINE shapefile is loaded into GeoPandas (the STATE shapefile is also loaded for later use). We limit our analysis to the continental United States and filter out the Great Lakes. Inspecting the first few records:

import numpy as np
import pandas as pd
import geopandas as gpd

np.set_printoptions(suppress=True, precision=5)
pd.options.mode.chained_assignment = None
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)


coastline_shp = "tl_2024_us_coastline.zip"
us_shp = "tl_2024_us_state.zip"


# Bounding box of lower 48 states. Remove Great Lakes.
xmin, ymin, xmax, ymax = -125, 24.6, -65, 50
coast = gpd.read_file(coastline_shp)
coast = coast.cx[xmin:xmax, ymin:ymax]
coast = coast[coast.NAME!="Great Lakes"].reset_index(drop=True)

# State boundaries.
states = gpd.read_file(us_shp)[["NAME", "geometry"]]
states = states.cx[xmin:xmax, ymin:ymax].reset_index(drop=True)

print(f"coast.shape : {coast.shape}")
print(f"states.shape: {states.shape}")

coast.head(10)

coast.shape : (2916, 3)
states.shape: (49, 2)

	NAME	MTFCC	geometry
0	Atlantic	L4150	LINESTRING (-80.88368 32.03912, -80.88365 32.0…
1	Atlantic	L4150	LINESTRING (-70.66800 41.51199, -70.65663 41.5…
2	Atlantic	L4150	LINESTRING (-76.58108 38.09572, -76.58184 38.0…
3	Atlantic	L4150	LINESTRING (-73.75518 40.58565, -73.75517 40.5…
4	Atlantic	L4150	LINESTRING (-76.15615 38.63324, -76.15070 38.6…
5	Atlantic	L4150	LINESTRING (-76.53289 39.20776, -76.53298 39.2…
6	Atlantic	L4150	LINESTRING (-73.93653 40.56644, -73.93594 40.5…
7	Atlantic	L4150	LINESTRING (-81.10208 29.42706, -81.10215 29.4…
8	Atlantic	L4150	LINESTRING (-71.89236 41.32922, -71.89293 41.3…
9	Atlantic	L4150	LINESTRING (-75.31239 38.94595, -75.31239 38.9…

The coastline shapefile is comprised of ~3,000 LINESTRING objects. Let’s get a count of geometries by NAME:

coast["NAME"].value_counts().sort_index()

NAME
Atlantic     941
Gulf         647
Pacific     1328
Name: count, dtype: int64

We can visualize the coastline by calling the coast GeoDataFrame’s plot method:

import matplotlib.pyplot as plt

fig, ax = plt.subplots(1, 1, figsize=(8, 6), tight_layout=True)
ax.set_title("Lower 48 Coastline", fontsize=9)
coast.plot(ax=ax, edgecolor="red", linewidth=1.0)
ax.axis("off")
plt.show()

To overlay the coastline along with state boundaries, download the STATE shapefile from the Census Bureau’s FTP site and plot them together:

fig, ax = plt.subplots(1, 1, figsize=(8, 6), tight_layout=True)
ax.set_title("Lower 48 States with Coastline", fontsize=9)
coast.plot(ax=ax, edgecolor="red", linewidth=1.50, linestyle="--")
states.boundary.plot(ax=ax, edgecolor="black", linewidth=0.50)
ax.axis("off")
plt.show()

Let’s next generate synthetic latitude-longitude pairs from within the Florida bounding envelope. The envelope bounds can be obtained from Florida’s geometry as follows:


# Get bounding box for each state.
states["bbox"] = states.geometry.map(lambda gg: gg.envelope.bounds)

# Put coordinates in separate columns.
states[["lon0", "lat0", "lon1", "lat1"]] = pd.DataFrame(states.bbox.tolist(), index=states.index)

states.head()

	NAME	geometry	bbox	lon0	lat0	lon1	lat1
0	West Virginia	POLYGON ((-77.75438 39.33346, -77.75422 39.333…	(-82.644591, 37.20154, -77.719519, 40.638801)	-82.644591	37.201540	-77.719519	40.638801
1	Florida	MULTIPOLYGON (((-83.10874 24.62949, -83.10711 …	(-87.634896, 24.396308, -79.974306, 31.000968)	-87.634896	24.396308	-79.974306	31.000968
2	Illinois	POLYGON ((-87.89243 38.28285, -87.89334 38.282…	(-91.513079, 36.970298, -87.019935, 42.508481)	-91.513079	36.970298	-87.019935	42.508481
3	Minnesota	POLYGON ((-95.31991 48.99892, -95.31778 48.998…	(-97.239093, 43.499361, -89.483385, 49.384479)	-97.239093	43.499361	-89.483385	49.384479
4	Maryland	POLYGON ((-75.75600 39.24607, -75.75579 39.243…	(-79.487651, 37.886605, -74.986282, 39.723037)	-79.487651	37.886605	-74.986282	39.723037

Let’s draw the bounding region using folium:

import folium 

# Florida bounding box. 
lon0, lat0, lon1, lat1 = states[states.NAME=="Florida"].bbox.item()

mlat, mlon = (lat0 + lat1) / 2, (lon0 + lon1) / 2

m = folium.Map(
    location=[mlat, mlon], 
    zoom_start=6, 
    zoom_control=True, 
    no_touch=True,
    tiles="OpenStreetMap"
    )

folium.Rectangle(
    [(lat0, lon0), (lat1, lon1)], 
    fill_color="blue", fill_opacity=.05
    ).add_to(m)

m

Make this Notebook Trusted to load map: File -> Trust Notebook

Sampling from the bounding region highlighted above will result in many points in the Gulf. Let’s narrow the sampling space:

lon0, lat0, lon1, lat1 = (-80.5, 26, -81.75, 28)

mlat, mlon = (lat0 + lat1) / 2, (lon0 + lon1) / 2

m = folium.Map(
    location=[mlat, mlon], 
    zoom_start=7, 
    zoom_control=True, 
    no_touch=True,
    tiles="OpenStreetMap"
    )

folium.Rectangle(
    [(lat0, lon0), (lat1, lon1)], 
    fill_color="blue", fill_opacity=.05
    ).add_to(m)

m

Make this Notebook Trusted to load map: File -> Trust Notebook

Next the points list needs to be represented as a GeoDataFrame, using the generated points as the geometry. We set "crs=EPSG:4326" representing longitude-latitude pairs. A policy_id is included as an identifier for each point.

dfpoints = pd.DataFrame({
    "policy_id": [str(ii).zfill(7) for ii in range(len(points))],
    "lat": rlats, 
    "lon": rlons, 
})

# Create GeoDataFrame.
points = gpd.GeoDataFrame(
    dfpoints,
    geometry=gpd.points_from_xy(dfpoints.lon, dfpoints.lat),
    crs="EPSG:4326"
)

points.head(10)

	policy_id	lat	lon	geometry
0	0000000	26.158425	-81.697845	POINT (-81.69785 26.15842)
1	0000001	27.308367	-80.743510	POINT (-80.74351 27.30837)
2	0000002	26.050160	-81.226294	POINT (-81.22629 26.05016)
3	0000003	26.896432	-80.682348	POINT (-80.68235 26.89643)
4	0000004	27.407736	-81.556209	POINT (-81.55621 27.40774)
5	0000005	26.551877	-80.902803	POINT (-80.90280 26.55188)
6	0000006	26.301071	-81.521000	POINT (-81.52100 26.30107)
7	0000007	27.435718	-81.499145	POINT (-81.49914 27.43572)
8	0000008	27.784378	-80.590765	POINT (-80.59076 27.78438)
9	0000009	27.635399	-81.389836	POINT (-81.38984 27.63540)

With both the coastline shapefile and point data represented as GeoDataFrames, we execute the sjoin_nearest spatial join to get the distance from each point to the nearest coastline. First we need to set the crs to a projected coordinate system so the distances are returned in units of meters instead of degrees. Projected coordinate systems use linear units like meters or feet, which makes it easier to perform precise spatial measurements. Here we opt for the Conus Albers equal area conic projection (EPSG:5069).

In the call to sjoin_nearest, we specify “meters” for the distance_col argument. This column will hold the distance to the coastline for each point in points in units of meters. A miles column is added after the join.

# Convert from GPS to  Conus Albers. 
points = points.to_crs("EPSG:5069")
coast = coast.to_crs("EPSG:5069")

# Perform spatial join. Covert meters to miles. 
gdf = gpd.sjoin_nearest(points, coast, how="left", distance_col="meters")
gdf["miles"] = gdf["meters"] * 0.000621371

# Get min, max and average distance to coast line.
min_dist = gdf.miles.min()
max_dist = gdf.miles.max()
avg_dist = gdf.miles.mean()

print(f"min. distance to coastline: {min_dist}")
print(f"max. distance to coastline: {max_dist}")
print(f"avg. distance to coastline: {avg_dist}")

min. distance to coastline: 0.20197535859716287
max. distance to coastline: 65.35183791451661
avg. distance to coastline: 35.26215239057489

Rate Group Based on Distance to Coastline

Let’s imagine a hypothetical rating plan that uses the following distances from the coastline to determine rates:

0 – 5 miles: very high risk
5 – 25 miles: high risk
25 – 50 miles: medium risk
greater than 50 miles: low risk

A rolling join via merge_asof can be used to select the last row in the right DataFrame (group thresholds) whose on key is less than or equal to gdf’s key, which will be “miles” in both DataFrames.

# Create dfgroups DataFrame. 
dfgroups = pd.DataFrame({
    "risk": ["very high", "high", "medium", "low"],
    "miles": [0., 5., 25., 50.]
})

# Assign risk group to each policy location.
gdf = gdf.sort_values("miles", ascending=True)
gdf = pd.merge_asof(gdf, dfgroups, on="miles", direction="backward")

gdf.head(10)

	policy_id	lat	lon	geometry	index_right	NAME	MTFCC	meters	miles	risk
0	0000016	27.972984	-80.506574	POINT (1523126.311 669355.565)	176	Atlantic	L4150	325.047932	0.201975	very high
1	0000000	26.158425	-81.697845	POINT (1436901.833 453402.683)	2813	Gulf	L4150	11201.767881	6.960454	high
2	0000015	26.383000	-81.709033	POINT (1432089.414 477631.030)	2817	Gulf	L4150	14922.894431	9.272654	high
3	0000045	26.189330	-81.661985	POINT (1439966.962 457301.849)	2813	Gulf	L4150	15292.789125	9.502496	high
4	0000008	27.784378	-80.590765	POINT (1518283.776 647462.840)	211	Atlantic	L4150	16412.307856	10.198132	high
5	0000037	26.366454	-81.662935	POINT (1436947.116 476528.844)	2813	Gulf	L4150	18665.871130	11.598431	high
6	0000030	27.473270	-80.505108	POINT (1532219.823 614970.878)	10	Atlantic	L4150	20055.344308	12.461809	high
7	0000017	26.086354	-81.489777	POINT (1458848.059 448748.018)	2813	Gulf	L4150	24280.862758	15.087424	high
8	0000044	27.226453	-80.508471	POINT (1536305.431 588071.282)	10	Atlantic	L4150	29966.032034	18.620023	high
9	0000006	26.301071	-81.521000	POINT (1452153.395 471583.300)	2813	Gulf	L4150	30913.531896	19.208772	high

Counting the number of policies per risk group:

gdf.risk.value_counts().sort_index()

risk
high         12
low          10
medium       27
very high     1
Name: count, dtype: int64

Finally, we can assign each risk a different color and visualize the resulting risk groups with folium:

# Colors for each group.
dcolors = {
    "very high": "#f40002",
    "high": "#ff721f",
    "medium": "#fafc15",
    "low": "#268a6d"
}

gdf["color"] = gdf["risk"].map(dcolors)

m = folium.Map(location=[mlat, mlon], zoom_start=8)

for tt in gdf.itertuples():
    lat, lon, color = tt.lat, tt.lon, tt.color
    folium.CircleMarker(
        location=[lat, lon],
        radius=6, 
        color=color, 
        fill_color=color, 
        fill=True,
        fill_opacity=1
        ).add_to(m)

m

Make this Notebook Trusted to load map: File -> Trust Notebook `); // facebook page document_write_mobile_aware(`

Python-bloggers

Data science news and tutorials - contributed by Python bloggers

Determining Distance to Coastline for Policy Locations Using GeoPandas

Coastal Shapefiles

Rate Group Based on Distance to Coastline

Recent Posts

Sponsors

Archives