The answers to spatial problems tend to converge towards certain techniques that everyone uses. Here are a few.
If you’ve used PostGIS, it’s likely you’ve run into the all-important index problem. Adding a spatial index to a table means the difference between multi- and sub-section queries on a table. In software, the same principle goes: indexes are typically required for performance if you’re asking anything about spatial intersections.
Typically spatial indexes are R-trees: the data structure provides a nice balance of space and performance, and is fairly simple to implement.
R-Trees contain rectangles and points, not polygons or lines. What if you need a more specific answer? This leads us to an important technique.
A useful and simple technique for asking questions like ‘is this point in any of these 2 million polygons?’ is that you can break it down into steps based on assumptions:
This takes advantage of the fact that the first question can be extremely fast to answer, thanks to R-Tree, whereas the second question, whether a point is in a specific polygon, usually cannot be fast. The canonical algorithm for point-in-polygon intersections looks at every point in the polygon, even in highly optimized implementations like GEOS.
There are many ways to calculate the area of a polygon on the earth. The most complex and most accurate is rarely used - taking into account the minute surface variations and elevation differences of the geoid.
A much more common approach is to calculate the area of a ring using some simple trigonometry popularized by Robert Chamberlain at NASA. There are many implementations of that approach, like geojson-area and geojson-utils.
One neat side-effect of the algorithm to calculate polygon area is that it also answers the question of the polygon’s winding number: a clockwise loop will yield area greater than or equal to zero, while a counterclockwise loop has negative area.
What about finding the closest points to a certain point? The old-fashioned
answer from my MySQL days would be something like a bounding box query
<: you can do better if you’re writing an application.
Many questions distill into these: for instance, which shapes interact with a bounding box, a common question in tiled maps, is some combination of indexing and point in polygon checking.
An interesting new approach that we’ve been exploring at Mapbox and that has been implemented in Google’s S2 Geometry Library and Morgan Herlocker’s tile-cover library, is to use cell indexes rather than trees to answer these questions.
It’s a promising approach because it flattens the index structure, making it possible to distribute the index across a network and express spatial queries as range queries. Since the derivation of cells from a shape is very similar to rasterization, we can steal tricks from graphics programming and use ray-tracing techniques to make it super-fast.