March 2015


Quadgrid Generation Using Recursive SQL Function

For something a little different, here is a PostGIS recursive SQL quadgrid function which has been in my toolbox for some time now. The inspiration came in 2010 when reading “Open Source GIS: A Grass GIS Approach” by Markus Neteler and Helena Mitasova (3rd edition, p.244). The quadgrid function works by recursively subdividing tiles into smaller tiles (quadcells) up to a maximum depth if say the number of intersecting points (or some other feature criteria) exceeds a certain threshold.

Quadgrids have many applications, but I’ve found them useful for mapping urban phenomena (such as population density), both in 2D and 3D. Since large cities tend to be packed with more people, the result is more quadcells packed into a given land area. Quadgrids are also computationally efficient, as a finer grid cell resolution is only used in the more populous areas.

quadgrid

Below is an example of how I’ve used a population density quadgrid to create a 3D city skyline of Sydney. The taller the quadcells, the greater the number of people per square meter – no different to the actual built form of our cities where, if land is scarce, we build skyward.
sydney_skyline

The recursive quadgrid function is available on github. It requires PostgreSQL9.3 or above.
https://github.com/dimensionaledge/cf_public/blob/master/lattices/DE_RegularQuadGrid.sql

The function is called in the same way as any other pl/pgsql function. For instance:

CREATE TABLE quadgrid AS
SELECT depth::integer, the_geom::geometry(Polygon, 3577) as wkb_geometry, cell_value::float8
FROM DE_RegularQuadGrid((SELECT wkb_geometry FROM abs_aus11_tiles_32k WHERE tid IN (17864)), ‘tutorials.abs_mb11_points’,’wkb_geometry’, 10, 1000);

The arguments of the quadgrid function are: the parent tile geometry, the name of the points feature table along with its geometry column name, the maximum depth of recursion, and the threshold number of points per cell above which a cell will sub-divide.

The function uses two helper functions, namely DE_MakeRegularQuadCells() and DE_MakeSquare(). They work in tandem by taking a square geometry and subdividing it into four child geometries.
https://github.com/dimensionaledge/cf_public/tree/master/shapes

I’ve also just refactored the quadgrid function to incorporate a new, highly efficient LATERAL ‘Point in Polygon’ code pattern which avoids the use of GROUP BY when counting points in polygons. The result is a 75% reduction in query times compared to the conventional CROSS JOIN and GROUP BY approach.

SELECT r.the_geom, r.pcount FROM
(SELECT DE_MakeRegularQuadCells(wkb_geometry) as the_geom FROM abs_aus11_tiles_32k WHERE tid IN (17864)) l,
LATERAL
(SELECT l.the_geom, count(*) as pcount FROM tutorials.abs_mb11_points WHERE ST_Intersects(l.the_geom, wkb_geometry) AND l.the_geom && wkb_geometry) r;

Quadgrid run times depend on the number of underlying point features, and the depth of recursion. In the above GIF animation, there are 1.86 million points alone that intersect with the parent tile, making it one of the most populous areas in Australia. For this exercise, the time to ‘quadgrid’ this one tile took 19.9 seconds to a depth of 10. Less populated tiles tend to take only fractions of a second.

DE_MakeRegularQuadGrid.sql
  1. ---------------------------------------------------------------------------
  2. -- Code Desciption:
  3. ---------------------------------------------------------------------------
  4. -- PostgreSQL/PostGIS custom function for generating quadcells recursively from a given starting geometry and intersecting reference table, to a maximum number of iteration levels or threshold value per cell
  5. -- Dependencies: DE_MakeSquare(), DE_MakeRegularQuadCells()
  6. -- Developed by: mark[a]dimensionaledge[dot]com
  7. -- Licence: GNU GPL version 3.0
  8. ---------------------------------------------------------------------------
  9.  
  10. DROP FUNCTION IF EXISTS DE_RegularQuadGrid(geometry, text, text, integer, double precision);
  11. CREATE OR REPLACE FUNCTION DE_RegularQuadGrid(parent_geom geometry, reference_table text, reference_geom_col text, max_depth integer, threshold_value double precision)
  12. RETURNS TABLE (depth integer, the_geom GEOMETRY, cell_value double precision)
  13. AS $$
  14. DECLARE
  15. reference_geom_type text;
  16.  
  17. BEGIN
  18. EXECUTE 'SELECT GeometryType('|| reference_geom_col ||') FROM '|| reference_table ||' LIMIT 1' INTO reference_geom_type ;
  19.  
  20. IF reference_geom_type NOT IN ('POINT')
  21. THEN
  22. RAISE EXCEPTION 'Reference table is not a valid geometry type';
  23. ELSE
  24. END IF;
  25.  
  26. RETURN QUERY EXECUTE
  27. 'WITH RECURSIVE quadcells (depth, the_geom, cell_value) AS (
  28. --SEED THE PARENT GEOMETRY AND CELL VALUE
  29. SELECT 1, l.the_geom, r.pcount
  30. FROM (SELECT ST_GeomFromEWKT(ST_AsEWKT('|| quote_literal(CAST(parent_geom as text)) ||')) as the_geom) l,
  31. LATERAL
  32. (SELECT count(*) as pcount, l.the_geom FROM '|| reference_table ||' WHERE ST_Intersects(l.the_geom, '|| reference_geom_col ||') AND l.the_geom && '|| reference_geom_col ||') r
  33. --RECURSIVE PART
  34. UNION ALL
  35. SELECT t.depth, t.the_geom, t.pcount
  36. FROM
  37. --TERMINAL CONDITION SUBQUERY LOOPS UNTIL THE CONDITIONS ARE NO LONGER MET - NOTE THE RECURSIVE ELEMENT CAN ONLY BE EXPLICITYLY REFERRED TO ONCE, HENCE THE USE OF CTE
  38. (
  39. WITH a AS (SELECT * FROM quadcells WHERE the_geom IS NOT NULL AND depth < '|| max_depth ||' AND cell_value > '|| threshold_value ||'),
  40. b AS (SELECT max(depth) as previous FROM a),
  41. c AS (SELECT a.* FROM a,b WHERE a.depth = b.previous),
  42. d AS (SELECT r.the_geom, r.pcount FROM (SELECT DE_MakeRegularQuadCells(the_geom) as the_geom FROM c) l, LATERAL (SELECT count(*) as pcount, l.the_geom FROM '|| reference_table ||' WHERE ST_Intersects(l.the_geom, '|| reference_geom_col ||') AND l.the_geom && '|| reference_geom_col ||') r)
  43. SELECT b.previous+1 as depth, d.the_geom, d.pcount FROM b, d
  44. ) t
  45. )
  46. SELECT depth, the_geom, cell_value::float8 FROM quadcells WHERE ST_IsEmpty(the_geom)=false AND (cell_value <= '|| threshold_value ||' OR (cell_value > '|| threshold_value ||' AND depth = '|| max_depth||'))' ;
  47.  
  48. END;
  49. $$ LANGUAGE 'plpgsql' VOLATILE;
  50.  
DE_MakeRegularQuadCells.sql
  1. ---------------------------------------------------------------------------
  2. -- Code Desciption:
  3. ---------------------------------------------------------------------------
  4. -- PostgreSQL/PostGIS custom function for subdividing a square polygon into four child polygons
  5. -- Dependencies: DE_MakeSquare()
  6. -- Developed by: mark[at]dimensionaledge[dot]com
  7. -- Licence: GNU GPL version 3.0
  8. ---------------------------------------------------------------------------
  9. -- Usage Example:
  10. ---------------------------------------------------------------------------
  11. -- SELECT DE_MakeRegularQuadCells(DE_MakeSquare(ST_MakePoint(0,0),1));
  12. ---------------------------------------------------------------------------
  13.  
  14. CREATE OR REPLACE FUNCTION DE_MakeRegularQuadCells(parent GEOMETRY)
  15. RETURNS SETOF GEOMETRY
  16. AS $$
  17. DECLARE
  18. halfside float8;
  19. i INTEGER DEFAULT 1;
  20. srid INTEGER;
  21. centerpoint GEOMETRY;
  22. centersquare GEOMETRY;
  23. quadcell GEOMETRY;
  24.  
  25. BEGIN
  26. srid := ST_SRID(parent);
  27. centerpoint := ST_Centroid(parent);
  28. halfside := abs(ST_Xmax(parent) - ST_Xmin(parent))/2;
  29. centersquare := ST_ExteriorRing(DE_MakeSquare(centerpoint, halfside));
  30.  
  31. WHILE i < 5 LOOP
  32. quadcell := DE_MakeSquare(ST_PointN(centersquare, i), halfside);
  33. RETURN NEXT quadcell;
  34. i := i + 1;
  35. END LOOP;
  36.  
  37. RETURN;
  38. END
  39. $$ LANGUAGE 'plpgsql' IMMUTABLE;
  40.  
DE_MakeSquare.sql
  1. ---------------------------------------------------------------------------
  2. -- Code Desciption:
  3. ---------------------------------------------------------------------------
  4. -- PostgreSQL/PostGIS custom function for generating a square polygon of a specified size
  5. -- Dependencies: nil
  6. -- Developed by: mark[at]dimensionaledge[dot]com
  7. -- Licence: GNU GPL version 3.0
  8. ---------------------------------------------------------------------------
  9. -- Usage Example:
  10. ---------------------------------------------------------------------------
  11. -- SELECT DE_MakeSquare(ST_MakePoint(0,0),1);
  12. ---------------------------------------------------------------------------
  13.  
  14. CREATE OR REPLACE FUNCTION DE_MakeSquare(centerpoint GEOMETRY, side FLOAT8)
  15. RETURNS GEOMETRY
  16. AS $$
  17. SELECT ST_SetSRID(ST_MakePolygon(ST_MakeLine(
  18. ARRAY[
  19. st_makepoint(ST_X(centerpoint)-0.5*side, ST_Y(centerpoint)+0.5*side),
  20. st_makepoint(ST_X(centerpoint)+0.5*side, ST_Y(centerpoint)+0.5*side),
  21. st_makepoint(ST_X(centerpoint)+0.5*side, ST_Y(centerpoint)-0.5*side),
  22. st_makepoint(ST_X(centerpoint)-0.5*side, ST_Y(centerpoint)-0.5*side),
  23. st_makepoint(ST_X(centerpoint)-0.5*side, ST_Y(centerpoint)+0.5*side)
  24. ]
  25. )),ST_SRID(centerpoint));
  26. $$ LANGUAGE 'sql' IMMUTABLE STRICT;
  27.  

Bezier Curves – A More Flexible Alternative to Great Circles

This tutorial introduces a flexible method for generating flight paths in PostGIS using a custom bezier curve function. Unlike Great Circles, bezier curves provide the user the ability to set the amount of ‘flex’ or ‘bend’ exhibited by the curve, the number of vertices (useful for timeseries animations), and a break value at a predefined meridian to ensure the visual integrity of the curve is maintained when plotting in your favourite projection. I also have a 3D version of the bezier curve function which I’ll share at a later date.

The full code for this tutorial, including the QGIS technique for centering a world map in the Pacific Ocean, is available on github here.

flightpaths

I won’t repeat everything which is already documented in the github tutorial code and custom bezier curve function, except make mention of the five arguments used by the bezier curve function itself.

DE_BezierCurve2D(origin geometry, destination geometry, theta numeric, num_points integer, breakval numeric)

  1. origin geometry
  2. destination geometry
  3. theta, which is the maximum bend in the East-West plane
  4. the number of Linestring points or vertices
  5. the break value which splits the Linestring into a Multilinestring at a specified meridian

Example: DE_BezierCurve2D(o_geom, d_geom, 25, 1000, 30)

Origin and destination geometries should be self-explanatory.

Theta defines the maximum degree of ‘flex’ or ‘bend’ in the East-West plane, to which a SIN function is applied to produce the effect of straighter lines in the North-South plane. If you prefer that all curves exhibit the same degree of bend irrespective of the azimuth between their origins and destinations, then modify the bezier function by replacing SIN(ST_Azimuth(ST_Point(o_x, o_y), ST_Point(d_x, d_y)) with the value ‘1’. Or if you want to stipulate a minimum bend in the North-South plane, then enclose the associated formula within a PostgreSQL ‘greatest()’ function.

The number of Linestring points or vertices is a really handy feature for animating flight paths where you need to synchronise movements along flight paths using a clock. To do this, set the number of points equal to the flight duration between each origin destination pair. So a 60 minute flight gets 60 points or vertices. A 120 minute flight gets 120 points or vertices. Then dump the flight path vertices into a points table which you can then index to a flight timetable.

The last argument is the break value, which is the meridian at which the flight path will be broken into two Linestrings (thus becoming a Multilinestring). The break value should equal the PROJ4 ‘+lon_0’ value associated with the CRS which you intend to view the data in, plus or minus 180 degrees. An example is given in the github tutorial file.

At some point I’ll release my 3D version of this function which is very cool for making 3D flight path animations.

Enjoy!

From Days to Minutes – Geoprocessing of Alberta Land Use Data

When writing my first blog post “Eating the Elephant In Small Bites”, a question on the PostGIS mailing list caught my eye because the challenge posed was not dissimilar to something I once encountered when working with Australian land use data. This time the dataset related to land cover in Alberta, Canada, and the PostGIS query using ST_Interection() was reportedly taking many, many days to process one particular class of land cover. Fortunately the Alberta dataset is freely available, so I thought to test my PostGIS vector tiling and Map Reduce code pattern – and with great effect! This post describes how I addressed the Alberta land class problem, building upon the concepts and practices introduced in my first blog post to deliver query times of just minutes. As with the “Eating the Elephant in Small Bites” tutorial, our machine setup for this exercise is a CentOS6.5 Linux server instance running on AWS, with 16vCPUs and 30GB of memory.

The full code for this solution can be downloaded from github here

Data Understanding

Upon downloading and ingesting the Alberta land cover dataset into PostGIS, the problem becomes self-evident. The land class ‘lc_class 34’ comprises the least number of geometries, but with an average of 119 rings per polygon.

SELECT lc_class, SUM(ST_NumGeometries(wkb_geometry)) as num_geoms, SUM(ST_NPoints(wkb_geometry)) as num_points, round(SUM(ST_NRings(wkb_geometry))::numeric/SUM(ST_NumGeometries(wkb_geometry))::numeric,3) as rings_per_geom FROM lancover_polygons_2010 GROUP BY lc_class ORDER BY 1;

 lc_class | num_geoms | num_points | rings_per_geom 
----------+-----------+------------+----------------
       20 |     83793 |    1306686 |          1.029
       31 |      1262 |      33366 |          1.115
       32 |      5563 |     291666 |          1.560
       33 |     12231 |     198385 |          1.023
       34 |       366 |    2646625 |        119.046
       50 |    196681 |    4750590 |          1.165
      110 |    154816 |    3137196 |          1.078
      120 |     83069 |    2833293 |          1.282
      210 |    150550 |    6260788 |          1.522
      220 |    197666 |    4793592 |          1.150
      230 |    112034 |    2419651 |          1.045
(11 rows)

Visually, this is how ‘lc_class 34’ appears in QGIS with green fill.

lc_class34

lc_class34_zoom

Upon closer inspection, we see ‘lc_class 34’ has a trellis-like structure where each contiguous block of white space (i.e. the “negative space”) is a polygon inner ring. And therein lies the problem. Like with multipolygons, complex polygons with inner rings are stored as arrays, meaning queries are often slowed by the nature of array operations, plus indexes are less effective than if each array element is dumped and stored as its own row. But luckily for us, there are standard PostGIS functions we can use to address this.

Solution

The solution to the Alberta ‘lc_class 34’ problem I will describe sequentially, starting with how to prepare the data by dumping the multipolygons and polyon rings, followed by a description of the fn_worker SQL code pattern. Finally we parallelise the code across all available cores.

Data preparation involves a two step “dumping” process, starting with ST_Dump() of the multipolygons, followed by ST_DumpRings() of the polygons dumped by step 1. Note that we record the path number of each polygon ring that is dumped. Exterior rings by default have a path value = 0, whilst interior rings have a path value > 0 signifying the interior ring number.

  1. # Create PostGIS tables (with indexes) for the dumped polygons and their rings
  2. SQL=$(cat<<EOF      
  3. -------------------------        
  4. ------- SQL BLOCK -------        
  5. -------------------------        
  6. DROP TABLE IF EXISTS landcover_dumped_34;        
  7. CREATE TABLE landcover_dumped_34 (
  8. fid serial,
  9. ogc_fid integer,
  10. wkb_geometry geometry(Polygon, 3400),
  11. lc_class integer,
  12. mod_ty text,
  13. shape_length float8,
  14. shape_area float8
  15. );
  16.  
  17. INSERT INTO landcover_dumped_34
  18. SELECT
  19. NEXTVAL('landcover_dumped_34_fid_seq'),
  20. ogc_fid,
  21. (ST_Dump(wkb_geometry)).geom,
  22. lc_class,
  23. mod_ty,
  24. shape_length,
  25. shape_area
  26. FROM lancover_polygons_2010 WHERE lc_class = 34;
  27.  
  28. DROP TABLE IF EXISTS landcover_dumped_34_rings;      
  29. CREATE TABLE landcover_dumped_34_rings (
  30. fid serial,
  31. ogc_fid integer,
  32. path integer,
  33. wkb_geometry geometry(Polygon, 3400)
  34. );
  35.  
  36. INSERT INTO landcover_dumped_34_rings
  37. WITH s AS (SELECT ogc_fid, (ST_DumpRings(wkb_geometry)).path as path, ST_Buffer(ST_SnapToGrid((ST_DumpRings(wkb_geometry)).geom,0.1)) as the_geom FROM  landcover_dumped_34)  --SNAP and BUFFER TO MAKEVALID
  38. SELECT
  39. NEXTVAL('landcover_dumped_34_rings_fid_seq'),
  40. ogc_fid,
  41. path[1],
  42. the_geom
  43. FROM s;
  44.  
  45. CREATE INDEX landcover_dumped_34_rings_wkb_geometry_idx ON landcover_dumped_34_rings USING GIST(wkb_geometry);
  46. ANALYZE landcover_dumped_34_rings;
  47.  
  48. -------------------------        
  49. EOF
  50. )
  51. echo "$SQL"  # comment to suppress printing
  52. # execute SQL STATEMENT or comment # to skip             
  53. psql -U $username -d $dbname -c "$SQL"  ### alternatively, comment out line with single '#' to skip this step
# Create PostGIS tables (with indexes) for the dumped polygons and their rings
SQL=$(cat<<EOF 	 	
-------------------------	 	 
------- SQL BLOCK -------	 	 
-------------------------	 	 
DROP TABLE IF EXISTS landcover_dumped_34;	 	 
CREATE TABLE landcover_dumped_34 (
fid serial,
ogc_fid integer,
wkb_geometry geometry(Polygon, 3400),
lc_class integer,
mod_ty text,
shape_length float8,
shape_area float8
);

INSERT INTO landcover_dumped_34
SELECT
NEXTVAL('landcover_dumped_34_fid_seq'),
ogc_fid,
(ST_Dump(wkb_geometry)).geom,
lc_class,
mod_ty,
shape_length,
shape_area
FROM lancover_polygons_2010 WHERE lc_class = 34;

DROP TABLE IF EXISTS landcover_dumped_34_rings;	 	 
CREATE TABLE landcover_dumped_34_rings (
fid serial,
ogc_fid integer,
path integer,
wkb_geometry geometry(Polygon, 3400)
);

INSERT INTO landcover_dumped_34_rings
WITH s AS (SELECT ogc_fid, (ST_DumpRings(wkb_geometry)).path as path, ST_Buffer(ST_SnapToGrid((ST_DumpRings(wkb_geometry)).geom,0.1)) as the_geom FROM  landcover_dumped_34)  --SNAP and BUFFER TO MAKEVALID
SELECT
NEXTVAL('landcover_dumped_34_rings_fid_seq'),
ogc_fid,
path[1],
the_geom
FROM s;

CREATE INDEX landcover_dumped_34_rings_wkb_geometry_idx ON landcover_dumped_34_rings USING GIST(wkb_geometry);
ANALYZE landcover_dumped_34_rings;

-------------------------	 	 
EOF
)
echo "$SQL"  # comment to suppress printing
# execute SQL STATEMENT or comment # to skip		 	 
psql -U $username -d $dbname -c "$SQL"  ### alternatively, comment out line with single '#' to skip this step

Next we generate a grid layout to serve as the basis for our tiles. For this exercise, we will use a 2km x 2km tile size as we wish to keep our "chopped" up geometries fairly small to enable fast queries of the 'lc_class 34' tiled geometries at a later date.

  1. #######################################
  2. #########  GRID CREATION  ##########
  3. #######################################
  4. # bash function for creating a regular vector grid in PostGIS
  5. fn_generategrid() {
  6. # get sql custom functions
  7. wget https://raw.githubusercontent.com/dimensionaledge/cf_public/master/lattices/DE_RegularGrid.sql -O DE_RegularGrid.sql
  8. wget https://raw.githubusercontent.com/dimensionaledge/cf_public/master/shapes/DE_MakeSquare.sql -O DE_MakeSquare.sql
  9. # load sql custom functions
  10. for i in *.sql; do
  11. psql -U $username -d $dbname -f $i
  12. done
  13.  
  14. SQL=$(cat<<EOF      
  15. -------------------------        
  16. ------- SQL BLOCK -------        
  17. -------------------------
  18. DROP TABLE IF EXISTS regular_grid_2k;
  19. CREATE TABLE regular_grid_2k AS (
  20. WITH s AS (SELECT DE_RegularGrid(ST_Envelope(ST_Collect(wkb_geometry)),2000) as wkb_geometry FROM abmiw2wlcv_48tiles)
  21. SELECT row_number() over() as tid, wkb_geometry::geometry(Polygon, 3400) FROM s);
  22.  
  23. -------------------------        
  24. EOF
  25. )
  26. echo "$SQL"  # comment to suppress printing
  27. # execute SQL STATEMENT or comment # to skip       
  28. psql -U $username -d $dbname -c "$SQL"
  29. }
  30. # end of bash function
  31.  
  32. # call the bash function or comment # to skip
  33. fn_generategrid
  34.  
  35. #######################################
#######################################
#########  GRID CREATION  ##########
#######################################
# bash function for creating a regular vector grid in PostGIS
fn_generategrid() {
# get sql custom functions
wget https://raw.githubusercontent.com/dimensionaledge/cf_public/master/lattices/DE_RegularGrid.sql -O DE_RegularGrid.sql
wget https://raw.githubusercontent.com/dimensionaledge/cf_public/master/shapes/DE_MakeSquare.sql -O DE_MakeSquare.sql
# load sql custom functions
for i in *.sql; do
psql -U $username -d $dbname -f $i
done

SQL=$(cat<<EOF 	 	
-------------------------	 	 
------- SQL BLOCK -------	 	 
-------------------------
DROP TABLE IF EXISTS regular_grid_2k;
CREATE TABLE regular_grid_2k AS (
WITH s AS (SELECT DE_RegularGrid(ST_Envelope(ST_Collect(wkb_geometry)),2000) as wkb_geometry FROM abmiw2wlcv_48tiles)
SELECT row_number() over() as tid, wkb_geometry::geometry(Polygon, 3400) FROM s);

-------------------------	 	 
EOF
)
echo "$SQL"  # comment to suppress printing
# execute SQL STATEMENT or comment # to skip	 	
psql -U $username -d $dbname -c "$SQL"
}
# end of bash function

# call the bash function or comment # to skip
fn_generategrid 

#######################################

We also create a vector tile table to hold the outputs of our worker function. Note that the worker function does the tiling of the exterior rings separately to the interior rings. The unions of each are taken before applying ST_Difference to remove the "negative space" of the interior rings from each tile.

  1. #######################################
  2. #####  DEFINE WORKER FUNCTION  ######
  3. #######################################
  4. # define the worker function to be executed across all cores
  5. fn_worker (){
  6. source $dbsettings
  7. SQL=$(cat<<EOF
  8. -------------------------
  9. ----- SQL STATEMENT -----
  10. -------------------------
  11. INSERT INTO vector_tiles
  12. WITH
  13. f0 AS (
  14.         SELECT
  15.         tid,
  16.         wkb_geometry as the_geom
  17.         FROM regular_grid_2k
  18.         WHERE tid >= $1 AND tid < $2
  19.         ),
  20. f1_p0 AS (
  21.         SELECT
  22.         f0.tid,
  23.         CASE WHEN ST_Within(f0.the_geom,rt.wkb_geometry) THEN f0.the_geom
  24.         ELSE ST_Intersection(f0.the_geom,rt.wkb_geometry) END as the_geom
  25.         FROM f0, $3 as rt
  26.         WHERE ST_Intersects(f0.the_geom, rt.wkb_geometry) AND f0.the_geom && rt.wkb_geometry AND rt.path = 0
  27.         ),
  28. f1_p0u AS (
  29.         SELECT
  30.         tid,
  31.         ST_Union(the_geom) as the_geom
  32.         FROM f1_p0
  33.         GROUP BY tid
  34.         ),
  35. f1_p1 AS (
  36.         SELECT
  37.         f0.tid,
  38.         CASE WHEN ST_Within(f0.the_geom,rt.wkb_geometry) THEN f0.the_geom
  39.         ELSE ST_Intersection(f0.the_geom,rt.wkb_geometry)
  40.         END as the_geom
  41.         FROM f0, $3 as rt
  42.         WHERE ST_Intersects(f0.the_geom, rt.wkb_geometry) AND f0.the_geom && rt.wkb_geometry AND rt.path > 0
  43.         ),
  44. f1_p1u AS (
  45.         SELECT
  46.         tid,
  47.         ST_Union(the_geom) as the_geom
  48.         FROM f1_p1
  49.         GROUP BY tid
  50.         ),
  51. f2 AS (
  52.         SELECT
  53.         f1_p0u.tid,
  54.         CASE WHEN  f1_p1u.tid IS NULL THEN f1_p0u.the_geom
  55.         WHEN ST_IsEmpty(ST_Difference(f1_p0u.the_geom,f1_p1u.the_geom)) THEN NULL
  56.         ELSE ST_Difference(f1_p0u.the_geom,f1_p1u.the_geom)
  57.         END as the_geom
  58.         FROM f1_p0u LEFT JOIN  f1_p1u
  59.         ON f1_p0u.tid = f1_p1u.tid
  60.         )
  61. ------------------------
  62. ---- result ----
  63. ------------------------
  64. SELECT
  65. NEXTVAL('vector_tiles_fid_seq'),
  66. tid,
  67. ST_Multi(the_geom),
  68. 1
  69. FROM f2
  70. WHERE the_geom IS NOT NULL;
  71. -------------------------
  72. EOF
  73. )
  74. echo "$SQL"  # comment to suppress printing
  75. # execute SQL STATEMENT
  76. psql -U $username -d $dbname -c "$SQL"
  77. }
  78. # end of worker function
  79.  
  80. # make worker function visible to GNU Parallel across all cores
  81. export -f fn_worker
  82.  
  83. #######################################
#######################################
#####  DEFINE WORKER FUNCTION  ######
#######################################
# define the worker function to be executed across all cores
fn_worker (){
source $dbsettings
SQL=$(cat<<EOF
-------------------------
----- SQL STATEMENT -----
-------------------------
INSERT INTO vector_tiles
WITH
f0 AS (
		SELECT
		tid,
		wkb_geometry as the_geom
		FROM regular_grid_2k
		WHERE tid >= $1 AND tid < $2
		),
f1_p0 AS (
		SELECT
		f0.tid,
		CASE WHEN ST_Within(f0.the_geom,rt.wkb_geometry) THEN f0.the_geom
		ELSE ST_Intersection(f0.the_geom,rt.wkb_geometry) END as the_geom
		FROM f0, $3 as rt
		WHERE ST_Intersects(f0.the_geom, rt.wkb_geometry) AND f0.the_geom && rt.wkb_geometry AND rt.path = 0
		),
f1_p0u AS (
		SELECT
		tid,
		ST_Union(the_geom) as the_geom
		FROM f1_p0
		GROUP BY tid
		),
f1_p1 AS (
		SELECT
		f0.tid,
		CASE WHEN ST_Within(f0.the_geom,rt.wkb_geometry) THEN f0.the_geom
		ELSE ST_Intersection(f0.the_geom,rt.wkb_geometry)
		END as the_geom
		FROM f0, $3 as rt
		WHERE ST_Intersects(f0.the_geom, rt.wkb_geometry) AND f0.the_geom && rt.wkb_geometry AND rt.path > 0
		),
f1_p1u AS (
		SELECT
		tid,
		ST_Union(the_geom) as the_geom
		FROM f1_p1
		GROUP BY tid
		),
f2 AS (
		SELECT
		f1_p0u.tid,
		CASE WHEN  f1_p1u.tid IS NULL THEN f1_p0u.the_geom
		WHEN ST_IsEmpty(ST_Difference(f1_p0u.the_geom,f1_p1u.the_geom)) THEN NULL
		ELSE ST_Difference(f1_p0u.the_geom,f1_p1u.the_geom)
		END as the_geom
		FROM f1_p0u LEFT JOIN  f1_p1u
		ON f1_p0u.tid = f1_p1u.tid
		)
------------------------
---- result ----
------------------------
SELECT
NEXTVAL('vector_tiles_fid_seq'),
tid,
ST_Multi(the_geom),
1
FROM f2
WHERE the_geom IS NOT NULL;
-------------------------
EOF
)
echo "$SQL"  # comment to suppress printing
# execute SQL STATEMENT
psql -U $username -d $dbname -c "$SQL"
}
# end of worker function

# make worker function visible to GNU Parallel across all cores
export -f fn_worker

#######################################

Next we create a job list which we proceed to execute in parallel. And voilà!

  1. #######################################
  2. ##########  CREATE JOB LIST  ###########
  3. #######################################
  4. # create job list to feed GNU Parallel.
  5. SQL=$(cat<<EOF      
  6. -------------------------        
  7. ------- SQL BLOCK -------        
  8. -------------------------
  9. -- create joblist where block size = 1000 tiles (i.e. tiles processed in batches of 1000)
  10. COPY (SELECT i as lower, i+1000 as upper FROM generate_series(1,250000,1000) i) TO STDOUT WITH CSV;      
  11. -------------------------
  12. EOF
  13. )        
  14. echo "$SQL"  # comment to suppress printing
  15. # execute SQL STATEMENT    
  16. psql -U $username -d $dbname -c "$SQL" > joblist.csv
  17.  
  18. #######################################
  19.  
  20. #######################################
  21. ##########  EXECUTE JOBS  ###########
  22. #######################################
  23. cat joblist.csv | parallel --colsep ',' fn_worker {1} {2} landcover_dumped_34_rings
  24. wait
  25. #######################################
#######################################
##########  CREATE JOB LIST  ###########
#######################################
# create job list to feed GNU Parallel.
SQL=$(cat<<EOF 	 	
-------------------------	 	 
------- SQL BLOCK -------	 	 
-------------------------
-- create joblist where block size = 1000 tiles (i.e. tiles processed in batches of 1000)
COPY (SELECT i as lower, i+1000 as upper FROM generate_series(1,250000,1000) i) TO STDOUT WITH CSV; 	 
-------------------------
EOF
)	 	 
echo "$SQL"  # comment to suppress printing
# execute SQL STATEMENT 	
psql -U $username -d $dbname -c "$SQL" > joblist.csv

#######################################

#######################################
##########  EXECUTE JOBS  ###########
#######################################
cat joblist.csv | parallel --colsep ',' fn_worker {1} {2} landcover_dumped_34_rings
wait
#######################################

Results of Query Times

The results show the impact of a larger tile size on query run times, and the resultant complexity of the tiled geometries produced (measured by the rings_per_geom). Whilst quadrupling the area of the tile size (from 2x2 to 4x4) reduces the run time from 779 seconds to 282 seconds, the rings_per_geom increases - which is understandable given the trellis-like structure of the source data. Both scenarios offer a vast improvement on the many, many days it reportedly took to process the data using a standard ST_Intersection query. As to the acceptability of the tile-size trade-off, I think it really depends on the criticality of query times with respect to the downstream processes that will ultimately consume the tiled geometries. It thus becomes a matter of judgement.

2k x 2k tiles
TOTAL SCRIPT TIME: 779 (batch size =1000 tiles)
 num_geoms | num_points | rings_per_geom   
-----------+------------+----------------
    144799 |    3471051 |          1.055

4k x 4k tiles
TOTAL SCRIPT TIME: 282 (batch size =250 tiles)
 num_geoms | num_points | rings_per_geom   
-----------+------------+----------------
     57775 |    2966397 |          1.263

lc_class34_tiled

Conclusions

The Alberta land cover problem highlights a few of the challenges that analysts typically encounter when working with large, complex datasets. The solution presented here is a practical example of how vector tiling and Map Reduce concepts can deliver real geo-processing efficiency improvements. However the real business value of reducing query times - from days to minutes - is to accelerate the 'business time to insight' and to amplify the speed or volume of iterations the analyst can consider as part of the geospatial 'value discovery' process. As this exercise attests, the business impact that Spatial IT can make over traditional desktop GIS approaches in the context of ever-shortening decision windows is nothing short of 'game changing'. Welcome to the emerging world of Big Data.