Conceptual Data Models
Tilesets: Introduction and Background
Many of the existing OGC standards have assumed or defined two dimensional tilesets, such as in WMTS. It was agreed at the OGC Tech Conference in Boulder, June 2015, that the Web Coverage Tile Service SWG should define a conceptual model of an n-dimensional tileset, where n=2,3,4,5,.... The case where n=3 (x,y,z) is straighforward to envisage, and n=4 (x,y,z,t) not too difficult either. If one is considering multi-wavelength remote sensing or imaging, n=5 (x,y,z,t,λ) becomes reasonable. Meteorology may routinely forecast, not specific values, but a distribution of likely values, a Probability Distribution Function, so we could then envisage n=6 (x,y,z,t,λ,π). Of course a more recent weather forecast is probably more accurate than an earlier one, so now we could have n=7 (x,y,z,t1,λ,π,t2), and we still have not considered all the possible variable or parameters that are of interest: n=8 (p,x,y,z,t1,λ,π,t2)!
Of course, a time series, or a vertical sounding in the atmosphere or the ocean, could be considered a 1-D tileset.
A tileset is not essential to the basic concepts of features and coverage in the geospatial arena, but is a very practical way of dividing up what could be an expensive response to a request into manageable chunks - tiles.
If the content of these tiles is not volatile, they can then be cached and re-used by other requests. This suggests that each reasonably persistent tile has a reasonably persistent idnetifier, such as a URI to allow such retrieval.
Discussion Points
- We probably need to distinguish those dimensions that are, in some sense, continuous (x,y,z,t) in that interpolation to intermediate values is reasonable and those 'dimensions' that really are discrete layers, such as successive parameters (wind speed, wind direction, temperature, humidity,..) that may occur in a NetCDF file, where interpolation is meaningless, or perhaps not reasonable.
- We probably need a discussion here of range and domain sets in WCS, and the problem space of various ancillary services associated with WCS such as WPS, WCPS, etc and interfaces to other standards and services. E.g. a likely processing chain to suspport a WCTileService.
- We need some UML diagrams.
- We need to agree a modern, appropriate terminology ('pixel' for a value in an (x,y) tile may be misleading).
Existing Concepts
Here is a summary of some relevant tileset/datacube/data grid concepts.
TestBed 11 Engineering Report: Referenceable Grid Harmonization, OGC15-065
This recent report by Peter Baumann and Eric Hirschorn has some useful terminology rationalization. I think Peter will probably present a version of this to OGC TC Nottingham 2015-09 and then onto ISO as part of harmonisation of ISO19123, GMLCov, etc
In particular, they clarify: Grid->Rectified Grid->Irregular Referenceable Grid->Distorted Referenceable Grid->Other grids
I propose that we initially limit ourselves to Rectified Grids and Irregular Referenceable Grids
WMTS Web Map Tile Service
2D to be done
WMTS Simple Profile
2D but only on certain map projections, to be done
GeoPackage
2D map images only?? to be done
DGGS Digital Global Grid System
?? to be done
Meteorology and Oceanography Modelling Grids
Meteorological and oceanographic forecast models usually assume a rectilinear quasi-horizontal grid covering either the complete earth or a 'rectangular' domain of interest. The earth is usually assumed to be a sphere, or occasionally an oblate spheroid (i.e. a N-S cross section including the earth's axis is an ellipse). Remote sensing information, such as from satellites, is usally re-projected from a highly detailed, geoid based, location and time to the model grids to allow consistent data usage.
The grid is usually regularly spaced in some map projection. These are often conformal (Equal Angle) such as Mercator or Northern Polar Stereographic, rather than Equal Area or other projections, as navigation is a primary use case.
In the vertical, there is usually an irregularly spaced grid, usually to have extra resolution in the boundary layer, the bottom kilometre or so of atmosphere, or near the tropopause where there are significantly strong winds. In the oceans, the grids have higher resolution just below the surface, though some continental shelf models also have increased vertical resolution near the bottom.
The grids are usually regular over time, say every few minutes or every hour, though the data may only be stored for a more irregular pattern such as 24 times hourly for a 24 hour forecast, then 16 times every 3 hours for two more days to day three, then 14 times every 12 hours until the ten day forecast is reached.
These grids define 'boxes' or cubes, and the data is usually a value representative of that volume and is considered to be at the centre of the 'box'. I.e. the temperature is an 'average temperature' over the volume. Some parameters may be accumulations over time, or integrations over the full depth of the vertical grid.
Any such grid also defines a dual, alternative, grid consisting of the centres of the boxes, or the planes/line/points of contact between the boxes. In practice, a grid may consist of a combination of such views. E.g. Pressure and Temperature are at the centre of a 'box', whereas wind components are on the 'edges' of the 'boxes'. Fortunately, in practice, values are nearly always interpolated to consistent 'central' positions for external consumption. There is a taxonomy of such patterns of parameter 'grid reference positions' created by the Japanese scientist Arakawa, and these are determined by the efficient solutions of equations. Generally, there is no usage of reference points like 'top left' or bottom right'.
So, for meteorology and oceanography, a tileset is a rectangular, multi-dimensional array of values, where 3- or 4-D location can be determine by counting using some scanning pattern in x,y,z and perhaps t and solving a relatively simple functional equation of the map projection. Some tilesets are 1-D, such as 'soundings'/vertical trajectories/ascents or time series. The traditional meteorological data formats do this. Also, values are often stored as application specific, scaled positive short integers to save space and bandwidth. For example, surface temperature is only measured to the nearest 0.1°C. More accuracy is spurious. Locations do not need to be highly accurate either, quantified to the nearest 10m, 100m or even a Km is usually good enough.
Some Meteorological and Oceanographic Complications (been there, done that!)
Meteorologists often rotate the poles and the equator to more convenient locations such as the north Pacific or Paris, and also 'stretch' the lat/long spacing.
Having values at half of a grid length north of the north pole, or south of the south pole, is not unusual.
We know how to handle multiple values of one vector value at the pole, as well as scalars and tensors. E.g, consider a one degree lat-long grid from -90 to +90, 0-360, size 180x360. In the +90 row, repeat the scalar value 360 times for consistent behaviour and processing. A vector value like wind or current velocity can also be done, but needs some mathematical knowledge.
Oceanographers often assume that there are three poles, to get convenient projections.
Some forecast models use spherical harmonic functions to represent values, giving rise to grids in physical space with slightly irregular latitude spacing.
Some operational models have grids based on an icosohedral partitioning of the earth's surface. There have been many other experiments, such as spiral grids with Fibonacci number based spacing.
Let us just ignore all these complications.
Some other sources of concepts
TimeseriesML and its predecessor, WaterML2.0 Part 1 generalise the 'interpolation type' of a value.
The CF-NetCDF convention for unambiguously annotating NetCDF datasets of environmental data calls the 'interpolation type' of a paremeter of interest the 'cell method'.
Bare Bones Conceptual Model
This is a brain dump of my thoughts and questions after the 2015-06-23 telco. I think that any conceptual model probably needs:
Rectilinear grids of boxes
Let's ignore TINs and non-rectilinear.
Whole earth edge cases:
- one grid box for whole earth (WMTS/Google Level 0 tileset?)
- very small grid boxes, each containing one point (WMTS/Google Level 18/17/16/.. tileset?)
Limited area edge cases:
- one grid box for whole area. Does is cross any Poles? Meridians?
- very small grid boxes, each containing one point (WMTS/Google Level 18/17/16/.. tileset?)
Grid points/data values/pixels within each box.
Assume each are 'regular' wrt the enclosing box, so they all align and let's not do staggered or not fully aligned and in phase.
Shape of the earth
Need to support spherical earths, oblate spheroids and more complex geoids.
Should we have a default? Yes - makes 'schema free' JSON and CSV easier, but do not forbid explcit declarations of a geoid. WGS84?
Separable dimensions
Dimensions are 'separable', in that each dimension can always be treated consistently and independently of the other dimensions. The n-Dimensional tileset/matrix/grid is a cartesian product of the individual grids in each dimension.
Time:
Data at regular intervals are easy. Specify Origin/Epoch in ISO8601, interval duration with UoM (seconds or hours or millions of years etc) and count.
Irregular intervals requires a Look-Up Table or sequence of specified times. Do we specify a temporal CRS? E.g. seconds or hours or millions of years etc. Or do we just use ISO8601 calendar orientated notation?
Do we assume times are centred? So that time T(n) is representative of box from (T(n-1)+T(n))/2 to (T(n)+T(n+1))/2 ? Or do we assume T(n) is representative of T(n) to T(n+1) or T(n-1) to T(n)? TImeseries ML call these options the "InterpolationType". AKA Pixel Ref Point. As a rule of thumb, imagery (aerial, satellite) use the centred 'pixel-is-area' and measurement data like elevation use 'pixel-is-point', usually 'top left', or in this case T(n) to T(n+1) .
The more usual interpretation of a time series value is that is representative of T(n-1) to T(n). The TimeseriesML proposed standard specifies an Interpolation Type, and there is no default. Currently the 14 possible values are:
Then we partition gridpoints/data values/pixels into equal sized groups. This is the 'tileset' for delivery. Edge cases: only one group for the whole time series or one pixel/value per group.
TimeseriesML recognises that a value at a spefici time may be one of 13 types, denoted by the mandatory Interpolation type
http://www.opengis.net/def/waterml/2.0/interpolationType/ :
- Average in Preceding Interval
- Average in Succeeding Interval
- Constant in Preceding Interval
- Constant in Succeeding Interval
- Continuous / Instantaneous
- Discontinuous
- Instantaneous Total
- Maximum in Preceding Interval
- Maximum in Succeeding Interval
- Minimum in Preceding Interval
- Minimum in Succeeding Interval
- Preceding Total
- Succeeding Total
Minimum 'interpolation types' for n-D tilesets?
Time, most sensible default seems to be:
- data value as an area ('pixel as area') represents T(n-1) to T(n) point.
- data value as a point ('pixel as point') represents T(n)
Vertical
Regular intervals are easy. specify Origin/vertical datum, vertical interval with UoM (metres, hPa, Flight levels etc) and count.
Irregular intervals requires a Look-Up Table, or sequence of specified levels, or Key Value Pairs, or similar. Do we specify a vertical CRS with datum? E.g. metres, hPa, Flight levels etc.
Do we assume levels are centred? So that level L(n) is representative of box from (L(n-1)+L(n))/2 to (L(n)+L(n+1))/2 ? Or do we assume L(n) is representative of L(n) to L(n+1) or L(n-1) to L(n)? What about L(0) the surface? This favours the pixel-is-point L(n) to L(n+1).Meteorology recognises both options.
Z, vertically up in atmosphere, most sensible defaults seem to be:
- data value as an area ('pixel is an area') represents Z(n-1) to Z(n) point.
- data value as a point represents Z(n)
Z, vertically down in the ocean (or earth?)most sensible defaults seem to be:
- data value as an area ('pixel is an area') represents Z(n-1) to Z(n) point.
- data value as a point represents Z(n)
Then we partition gridpoints/data values/pixels into equal sized groups. This is the 'tileset' for delivery. Edge cases: only one group for whole set or one pixel/group
Other dimensions
like wavelength? No different?
x and y
No different? apart from the combinatorial explosion of possible options for pixel-is-point (9 in 2D - 4 corners or 4 mid-points of sides or the cell/box centre) and pixel-is-area (1 in 2D, the cell/box centre)
-- Main.clittle - 16 Jul 2015