Want to share your content on python-bloggers? click here.
I recently became interested in GeoHashing, and wanted to develop an understanding of the algorithm with the goal of implementing it myself. I was surprised to find it to be quite simple and intuitive. In what follows, I’ll demonstrate how to generate GeoHashes for a given latitude and longitude and compare results against pygeohash
, a Python library for all things GeoHash-related. I’ll also walkthrough an approach that can be used to find the neighboring cells of a given GeoHash, and render them on a map.
A Geohash is a unique identifier of a specific region on the Earth. The basic idea is the Earth gets divided into rectangular regions of user-defined size, and each region is assigned a unique id which is called its Geohash. For a given location on earth, the algorithm converts an arbitrary precision latitude and longitude into a string, the idea being that regions with a similar string prefix will be closer together. Conceptually, GeoHashing reduces proximity search to string prefix matching. As each character encodes additional precision, shared prefixes denote geographic proximity. But the converse is not true: Two points may be close in terms of relative proximity, but have no common GeoHash characters (think of two points on opposite sides of the prime meridian).
Geohashes also provide a degree of anonymity since it isn’t necessary to expose exact GPS coordinates. The location of an entity up to a bounding box cell at a given precision is all that is known.
Before we begin, it is important to note that GeoHash libraries in the wild use a much more efficient generating mechanism than what is presented here. My goal is to demonstrate the concept with maximum clarity as opposed to maximum efficiency. If you decide to use GeoHashing in a real-world application, use an existing library.
Algorithm
A GeoHash is a hierarchical spatial index: In order to represent a point, the world is recursively divided into smaller and smaller grid cells with each additional bit until the desired precision is reached. Functionally, the algorithm works by storing the intermediate results of two binary searches. In the resulting bit string, even-indexed bits represent longitude, while odd-indexed bits represent latitude. The user specifies a level of precision, usually between 1 and 12, and a GeoHash of that length is returned. The table below gives the dimensions of GeoHash cells at each level of precision (taken from here):
Precision Dimension 1: 5,000km x 5,000km 2: 1,250km x 625km 3: 156km x 156km 4: 31.9km x 19.5km 5: 4.89km x 4.89km 6: 1.22km x 0.61km 7: 153m x 153m 8: 38.2m x 19.1m 9: 4.77m x 4.77m 10: 1.19m x 0.596m 11: 149mm x 149mm 12: 37.2mm x 18.6mm
The values in the table represent the maximum dimension at each level of precision: As locations move away from the equator, the dimensions of each cell get smaller.
For a GeoHash of length 12, the coordinate can be found in a cell having dimension 37.2mm x 18.6mm, which is an impressive level of precision for a 12 character string!
The GeoHash symbol map consists of 32 characters:
base32 = "0123456789bcdefghjkmnpqrstuvwxyz"
base32
consists of digits 0 thru 9 plus all lowercase letters excluding a, i, l, o
. As the search space is recursively partitioned, each group of 5 bits, when converted to a decimal integer, maps to one of the values in base32
above.
I’ll present the algorithm first, then walkthrough it step-by-step. What follows is a basic implementation of GeoHashing, which takes as input the latitude and longitude for a point of interest, and returns the GeoHash and bounding box to the specified level of precision:
""" Very inefficient GeoHashing subroutine. For demonstration purposes only. """ def get_geohash(lat, lon, precision=12): """ Generate GeoHash of lat/lon at specified precision. Parameters ---------- lat: float Latitude of point of interest. lon: float Longitude of point of interest precision: int Precision of GeoHash. Higher values result in smaller bounding regions. Returns ------- geohash, bbox as list """ base32 = "0123456789bcdefghjkmnpqrstuvwxyz" bits = [] min_lat, max_lat = -90, 90 min_lon, max_lon = -180, 180 for ii in range(5 * precision): if ii % 2 == 0: # Bisect longitude (E-W). mid_lon = (min_lon + max_lon) / 2 if lon >= mid_lon: bits.append(1) min_lon = mid_lon else: bits.append(0) max_lon = mid_lon else: # Bisect latitude (N-S). mid_lat = (min_lat + max_lat) / 2 if lat >= mid_lat: bits.append(1) min_lat = mid_lat else: bits.append(0) max_lat = mid_lat # Create single bit string from list of 0/1s. bitstr = "".join([str(ii) for ii in bits]) # Partition bitstr into groups of 5 bits. quints = [bitstr[(5 * ii):(5 * (ii + 1))] for ii in range(precision)] # Convert binary digits to decimal digits to get indices into base32. indices = [int(ii, base=2) for ii in quints] # Lookup characters associated with each index and concatenate. geohash = "".join([base32[ii] for ii in indices]) # GeoHash bounding box is just the final, min_lat, min_lon, max_lat, max_lon. bbox = [min_lat, min_lon, max_lat, max_lon] return(geohash, bbox)
We define base32
and initialize bits
to hold our 0s and 1s. The initial minimum and maximum latitudes and longitudes define the bounds of the search space.
The range of iteration is specified as range(5 * precision)
. For a GeoHash having precision=6, the bits
list will contain 5 * 6 = 30 values. We multiply by 5 since the base32
character map has length 32, and 2^5 bit arrangements cover each of the 32 indices.
If ii
is even, we bisect longitude. If lon >= mid_lon
we append 1 to bits
and update min_lat
. Otherwise we append 0 and update max_lon
. If ii
is odd, we execute the same logic but for latitude. In this way, the final bit string represents interleaved longitude and latitude bits.
Once iteration completes, the individual elements of bits
are concatenated into a single bit string bitstr
. Next, we partition bitstr
into groups of 5 bits (quints
), which are converted to decimal integers, which are then used as indices to lookup elements from base32
. These characters are concatenated into a single string representing the GeoHash.
To demonstrate, lets find the GeoHash for the Merchandise Mart (41.88876, -87.63516) at precision = 6. Once iteration completes, bits
will consist of 30 1s and 0s:
[0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1]
Which gets combined into a single bit string:
'011001010100011111001001101011'
Partitioned into groups of 5 bits:
['01100', '10101', '00011', '11100', '10011', '01011']
Converted to decimal integers:
[12, 21, 3, 28, 19, 11]
Which serve as indices into base32
, whose corresponding elements and are concatenated into the GeoHash:
'dp3wmc'
Our function also returns the bounding box of the precision = 6 GeoHash:
[41.8853759765625, -87.637939453125, 41.890869140625, -87.626953125]
Let’s compare the results of our implementation against pygeohash to ensure consistency:
import pygeohash lat, lon, precision = 41.88876, -87.63516, 6 pygeohash.encode(latitude=lat, longitude=lon, precision=precision)
'dp3wmc'
Let’s visualize the precision = 6 GeoHash using folium:
import folium from folium.features import DivIcon lat, lon, precision = 41.88876, -87.63516, 6 # Use our subroutine to get geohash and bounding box. geohash, bbox = get_geohash(lat, lon, precision=precision) # Unpack bbox. min_lat, min_lon, max_lat, max_lon = bbox # Get mid_lat and mid_lon for GeoHash id placement. mid_lat = (min_lat + max_lat) / 2 mid_lon = (min_lon + max_lon) / 2 m = folium.Map( location=[lat, lon], #width=900, #height=600, zoom_start=16, zoom_control=True, no_touch=True, tiles="OpenStreetMap" ) # precision = 6 GeoHash bounding box. folium.Rectangle( [(min_lat, min_lon), (max_lat, max_lon)], fill_color="blue", fill_opacity=.15 ).add_to(m) # Red dot at Merchandise Mart. folium.CircleMarker( location=[lat, lon], radius=5, color="red", fill_color="red", fill_opacity=1 ).add_to(m) # precision = 6 GeoHash id. folium.map.Marker( [mid_lat, mid_lon], icon=DivIcon( icon_size=(250,36), icon_anchor=(100,50), html=f'<div style="font-size: 40pt">{geohash}</div>', ) ).add_to(m) m