BigQuery Geography Clustering

Improve your geospatial analytics performance in minutes.

Jan 30, 2020 2 min read Data Engineering, Geospatial

The BigQuery team rolled out support for geography type a while ago and they have never stopped improving performances and Geographic Information System functions (GIS). This allows users to run complex geo-spatial analytics directly in BigQuery harnessing all its power, simplicity, and reliability.

Hold on your keyboard (or your screen if you are reading this on a mobile device).

Now you can cluster tables using a geography column. Say what!!!!

This is game changing for users working heavily with geodata. By clustering your table on a geography column, BigQuery can reduce the amount of data that needs to read to serve the query. This makes queries cheaper and run faster when filtering on clustering column.

Let’s see the benefits of clustering table using geography column with an example. We will use one of the great public datasets curated by Felipe Hoffa. We will use the weather_gsod specifically the two tables all and all_geoclustered.

Let’s say that we want the all-time minimum and maximum temperature within the Greater London area for each station. Our query will look like this:

Results:

name                min_temp  max_temp
ST JAMES PARK       26.3      83.2
KENLEY AIRFIELD     18.0      82.8
HEATHROW            18.6      83.4
CITY                22.3      92.2
BLACKWALL           37.0      62.2
NORTHOLT            18.4      84.1
BIGGIN HILL         15.5      90.3
PURLEY OAKS         23.0      81.7
LEAVESDEN           22.3      89.1
LONDON WEA CENTER   20.0      85.2
KEW-IN-LONDON       23.3      82.4

This query reads 9.02GB of data. Now let’s see how the same query perform on the clustered table:

The result is obviously the same but, this time, BigQuery reads just 98.97MB of the data!!! So switching to the geo clustered table made this query almost 100 times cheaper. Do you want to try other locations and test the differences yourself? You can use Huq Industries GisMap tool to quickly draw and export polygons in various formats.

BigQuery

Principal Data Platform Engineer

My interests include data engineering at scale, geospatial analytics, machine learning, windsurf, and cooking.

BigQuery Geography Clustering

Principal Data Platform Engineer

Related