In this chapter, you will learn the foundation of geographical information system and spatial data. Although you do not need to understand these subjects in great depth to take advantage of the features of GeoServer, we will give you the basic information required to understand what you will be doing in the book. You will be introduced to the magic of spatial.
We are going to cover the following topics:
Why is spatial data special?
Spatial data formats.
The magical world of Spatial Reference System (SRS): getting a sphere on a plane.
What is a map and why does it matter?
The art of Cartography. Building map types such as Choropleth and Proportional Symbol.
By the end of the chapter, you will have the basic skills to identify which spatial data format best suits your needs.
Since you were a kid at school you have been exposed to a lot of maps. Maps of countries, where you spent hours memorizing the boundaries, rivers, and capitals; historical maps, with the rise and fall of ancient empires, where you dreamed of being a great conqueror; economics maps, with the locations and amounts of goods and services. Every day on newspapers, on TV, or in a far more accurate presentation, in books and academic papers you look at data represented on a map. Maps are a spatial representation of data and are often the main output of a GIS.
GIS is an acronym for Geographical Information System. Does it sounds too complicated to you? Don't be afraid; it is not so different from many other systems for managing information you probably already know. The main difference is in the spatial piece of information. All the data contained in a GIS has a spatial dimension or a link to another object with spatial attributes.
So what is GIS? In a nutshell, we can define it as a system to acquire and store data, to process data, and to produce data representations, that is, maps. In this book you will learn that working with GeoServer requires you to prepare your data, process it to render in a beautiful map, and build up a set of functions that enable a user to interact with your data. So building up a GeoServer instance may be described as GIS-building.
A detailed comprehension of GIS is far beyond the scope of this book and it is not required for starting with GeoServer. But you need to have some basic skills in spatial data, maps, and spatial reference systems.
Let's go; we are going to turn you into a neo-cartographer!
Spatial data is the foundation of any GIS. You know that a building is likely to fall down unless it is sitting atop a strong foundation. So you need to understand spatial data or you will be producing poor map output.
But what is spatial data in simple words? From a general point of view you can consider a piece of spatial information. Each description of an object contains a reference to its position on the Earth's surface. Well, that is not a rigorous formal definition as there are a lot of objects below and over the earth's surface, but for now we are fine with this simplistic definition.
Think of some lists of familiar objects:
A list of bookshops with addresses
A list of places you visited during your trips
A list of points of interest, for example, restaurants, museums, and hotels, you collected with your mobile phone
An aerial photo with a view of a city, where you can recognize notable places
You can say where each element is located in a more or less precise way. They are real objects represented with spatial data. As you may have noted, the spatial information is represented in quite a heterogeneous way. Most people are able to recognize spatial information in any group from the previous list. Unfortunately, GIS software and GeoServer are an exception to this and tend to prefer a strong structured piece of information. If you are going to use your spatial data with GeoServer, you need to organize it more accurately. We will talk specifically about GeoServer's data connectors in Chapter 5, Adding Your Own DataStore, but for now it is important that you understand how spatial data is commonly organized and stored. As you keep on making maps, you will deal with lots of different spatial data.
So spatial data are references for an object's position on the earth's surface. How can you measure and store them in a numeric format? An elementary model of the earth could be a sphere. On a sphere's surface, you can measure positions with angular units called latitude and longitude. Latitude (ϕ) measures the angle between the equatorial plane and a line that passes through that point and is normal to the surface; whereas longitude (λ) measures the angle east or west from a reference meridian (for example, that passing through Greenwich observatory) to another meridian that passes through that point. Angular measures can be expressed in digital degrees or in degrees, minutes, and seconds.
If you want to store the location of The Statue of Liberty, you can express it as Lat. 40° 41′ 21″ N, Long. 74° 2′ 40″ W with degrees, minutes, and seconds or as 40.689167, -74.044444 using decimal degrees.
We normally think of earth as a sphere but this is not its real shape. Geodesy, the science of studying the earth's shape, defines earth as represented by a geoid, an ideal surface defined by the level of sea if oceans would cover the entire earth. For practical purposes, as in projections, geoid is too complicated to use and the earth's shape is defined by an ellipsoid. The ellipsoid is described by its semi-major axis (equatorial radius) and flattening.
Does it sound a little bit complicated? Don't be afraid and explore locations on earth with Lat. Long. coordinates. In the following table, there are a few famous places with coordinates in decimal degrees. Point your browser to http://maps.google.com, insert coordinates in the search textbox, and then press Enter. Your map will be panned to the location. Google maps enable you to query for coordinates of any place on earth; find that function and look for some great places.
Colorado Grand Canyon, USA
Iguazú National Park, Argentina
Ayers Rock, Australia
Did you ever play with an orange peel? I did it a lot when I was a child, often pressing them in the hope to flatten it almost perfectly. It's a hopeless challenge, but kids are stubborn and ambitious. Many years later I found a similar analogy in a geography book. It was talking about cartographic projection and used an orange as a model of the earth. If you think of the orange's peel as the earth surface, it is suddenly clear why you can't have a planar representation of the earth's surface without a great amount of distortion.
All the maps you will ever find are on a plain paper sheet. Curved digital screens are quite uncommon in GeoGeek's nests. So how do cartographers represent a curved surface on a plain? This is done by means of a mathematical operation called projection.
Indeed, there are several different projections developed in the last few centuries by cartographers and mathematicians. There is no mathematical method to transfer a sphere or an ellipsoid to a two-dimensional space without distortion. Hence, projections modify the data and include some deformations about lengths, areas, or shapes you can observe and measure on maps.
Conformal projections preserve angles locally. Meridian and parallels intersect at 90-degree angles.
Equal Area projections preserve proportions between areas. In a map with equal area projections, each part has the same proportional area as the corresponding part of the earth.
Equidistant projections maintain a scale along one or more lines, or from one or two points to all other points on the map. Lines along which the scale (distance) is correct, are of the same proportional length as the lines they reference on the globe.
It is important that you understand there is no best projection; choosing one for your map is a trade-off. According to the portion of the earth's surface, the map that you are designing will contain and/or use the projections that suit best. Let's explore some widely-used projections.
You learned about the earth's shape and about projection. Coordinate systems use these concepts to build a frame of reference to place objects on the earth's surface. There are two types of coordinate systems: projected coordinate systems and geographic coordinate systems.
Geographic coordinate systems use latitude and longitude as angles measured from the earth's centre, as we saw previously. A geographic coordinate system is substantially defined by the ellipsoid used to model the earth, and the position of the ellipsoid positioned relatively to the centre of the earth (called datum).
A projected coordinate system is defined on a flat two-dimensional surface. A projected coordinate system is always based on a geographic coordinate system, hence it uses an ellipsoid and a datum. Besides, a projected corporate systems includes a projection method to project coordinates from the earth's spherical surface onto a two-dimensional Cartesian coordinate plane.
Although there are hundreds of different projections, you can limit your knowledge to some which are widely used.
Commonly knwn as UTM, this is not really a projection. It is a system based on Transverse Mercator projection. This projection uses a cylinder tangent to a meridian to unwarp the earth's surface. A maximum of 5° of distortion from the central meridian is acceptable. The UTM splits the world into a series of 6° of longitudinal wide zones. As you may guess, there are 60 zones numbered from Long. 180W towards the east. Please note that you can't have a map representing more than one UTM zone. Indeed, UTM is well suited for big-scale maps.
Web Mercator is a projection derived from Transverse Mercator. It maps ellipsoidal latitude and longitude coordinates onto a plane using spherical Mercator equations. This projection was popularized by Google in Google Maps and it is now widely used on online mapping systems. It stretches areas in a north-south direction and, unlike the Transverse Mercator, it is not conformal.
A spatial reference system identifier is a code to easily reference a spatial reference system (SRS). An SRS contains parameters about projection, ellipsoid, and datum. It can be defined using the OGC's well-known text (WKT) representation. The SRS for the geographic WGS84 reference system is as follows:
GEOGCS["WGS 84", DATUM["WGS_1984", SPHEROID["WGS 84",6378137,298.257223563, AUTHORITY["EPSG","7030"]], AUTHORITY["EPSG","6326"]], PRIMEM["Greenwich",0, AUTHORITY["EPSG","8901"]], UNIT["degree",0.01745329251994328, AUTHORITY["EPSG","9122"]], AUTHORITY["EPSG","4326"]]
The last line contains the number 4326; this is the SRID uniquely identifying this SRS. The long form should also contain the authority, that is EPSG:4326, but you will often find it indicated only by the number.
EPSG is the acronym for European Petroleum Survey Group. It was founded in 1986 by several European Oil companies to collect and maintain geodetic information. In 2005, EPSG was absorbed by OGP (an international forum of Oil and Gas producers) which formed the OGP Geomatics Committee. The committee maintains the registry and publishes it as a public web interface or a downloadable database.
It is very important that you know which is your data's SRID. Without it you can't represent data on a map without the risk of great errors.
We described a couple of common and widely used SRSs, but there are a lot of them. There are several archives on the Internet where you can find detailed information about SRSs and their elements, that is ellipsoids, datums, unit of measurements, projected, or geographic reference systems. One of the most authoritative and complete data sets is the EPSG Geodetic Parameter Registry. If you are curious about it, you can open your browser and point it to http://epsg-registry.org. Then try a simple search by inserting a location name in the Area textbox:
There are two main approaches when building a spatial database, modeling vector data or raster data. Vector data uses a set of discrete locations to build basic geometrical shapes, such as points, polylines, and polygons.
Of course real objects are neither a point, nor a polyline or a polygon. In your model you have to decide which basic shape better suits the real object. For example, a town can be represented as a point if you are going to draw a map of the world with the countries' capitals shown. On the other hand, if you are going to publish a counties map, a polygon will enable you to draw the city boundaries to give a more realistic representation.
The simpler geometric object is a point. Points are defined as single coordinate pairs (x,y) when we work in two-dimensional space or coordinate triplets (x,y,z) if you want to take account of the eight coordinates. In the following examples, we use point features to store the location of active volcanoes:
Etna; 37.763; 14.993 Krakatoa; -6.102; 105.423 Aconcagua; -32.653; -70.011 Kilimanjaro; -3.065; 37.358
Did you guess the units and projections used? The coordinates are in decimal degrees and SRS is WGS84 geographic, that is EPSG:4326.
Points are simple to understand but don't give you many details about the spatial extent of an object. If you want to store rivers you need more than a coordinate pair. Indeed, you have to memorize an array of coordinate pairs for each feature in a structure called polyline:
Colorado; (40.472 -105.826, … , 31.901 -114.951) Nile; (-2.282 29.331, … , 30.167 31.101) Danube; (48.096 8.155, … ,45.218 29.761)
If you need to model an areal feature such as an island, you can extend the polyline object adding the constraint that it must be closed; that is the first and the last coordinate pairs must be coincident:
Ellis Island; (-74.043 40.699, -74.041 40.700, -74.040 40.700, -74.040 40.701, -74.037 40.699, -74.038 40.699, -74.038 40.698, -74.039 40.698, -74.041 40.700, -74.042 40.699, -74.040 40.698, -74.042 40.696, -74.044 40.698, -74.043 40.699)
The feature model used in GIS is a little bit more complex than what we have discussed. There are some more constraints regarding vertex ordering, line intersections, and areal shapes with holes. Different GIS specified several different set of rules, often in proprietary formats. Open Geospatial Consortium (OGC) defined a standard for simple features, and lately most systems, open source in primis, are compliant with it. If you are curious about it, you can point your browser at http://www.opengeospatial.org/standards/is and look for The OpenGIS® Simple Features Interface Standard.
Raster data uses a regular tessellation, defining cells where one or more values are uniform. Usually the cells are square, although this is not a constraint. Raster data is generally used to represent value continuously changing in the space, that is, a field. You can use a regular tessellation to build a digital elevation model of the earth's surface. In the following figure, each cell has a height and width of 20 meters and the value stored is the height over the sea level in meters:
Can you use raster data to model real features like a river? Yes, you can, but there are some drawbacks you have to consider. The following figure shows a linear feature represented as vector data (the red line) and as raster data (the black and white cells). If your purpose is drawing the shapes on a map, raster data is not a good choice as raster graphics are resolution-dependent. They cannot scale up to an arbitrary resolution without the apparent loss of quality.
In the previous sections, we explored spatial data and SRS. They are the key elements you need to build your map. Indeed, maps are planar representation of spatial data. You need to collect the appropriate data to represent the real objects you want to include in your map and you need to choose an SRS to organize your data into the map.
Keep in mind that maps are representations, a proposition of yours. They are the way you express your knowledge and your vision of the world. To fully accomplish this, there is a third basic ingredient for your map: symbols.
Symbols enable you to add information to the features shown on a map. For example, colors can be used to indicate a classification of roads. Imagine you need to produce a map of a country with a road network. You have a vector data set containing road polylines. A simple approach is to render all features with the same symbol, as shown in following figure. The map is not really informative unless you are a transportation expert. You won't extract any information from the map and it looks ugly too.
Lets have a look at a similar map produced with ArcGIS Online (http://www.esri.com/software/arcgis/arcgisonline).
It contains the road network symbolized with different colors and line widths, labels showing you highway codes, major towns represented with small circles and labels. Besides, there is a background depicting heights with colors and shading. Does it now look more familiar to you?
In Chapter 6, Styling Your Layers, we will learn how to apply symbols in GeoServer to produce maps like the previous one. For now you need to familiarize yourself with simple and thematic maps.
Open your browser and go to http://www.openstreetmap.org.
The website offers you a small scale map centered on your actual location, as derived from browser information.
Now enter the Piccadilly Circus, London, UK address in the Search textbox on the left and press the Go button. A list of results matching your search is presented on the left side of the map. Pick the first item:
The map is now at a great scale (look at the scalebar on the bottom-left angle) and the symbols are changed to show you greater detailed information about roads and locations. You can find street names, directions for car traffic, buildings' footprint, and icons for points of interest. The general look and feel resembles a printed city map you can pick up at tourist offices.
OpenStreetMap does not require you to register for browsing or exporting the data. Anyway, if you are interested in maps and open source data, you may consider getting involved in the project. OSM is a collaborative project to create a free editable map of the world, currently involving over half a million users all around the world. You may add data or find errors on locations you know well.
The maps we encountered so far are often defined as general maps. General maps focus on the description of the physical, political, and human features on the territory. All this data is portrayed for its own sake. In a nutshell, it can be said that general maps tell you where objects are located in space, while thematic maps talk about things happening in the space. Thematic maps focus on displaying a single topic and portray spatial distribution and variation. You have general data like administrative boundaries or road networks, but this is represented as a base layer for general reference.
Among thematic maps, those using choropleth or dot representations are by far the most common type you will be using GeoServer for.
Choropleth maps show statistical data aggregated over predefined regions, such as counties or states, by coloring or shading these regions. You can draw states according to their population, gross domestic product, car owners, and the number of national parks. You are not limited to a single variable; indeed you can merge different values from more than one attribute associated to spatial objects.
The following figure shows a map of European countries colored according to gross domestic product values. Legend on the right shows the five classification intervals. Values were normalized to Eu-27 average.
Are you ready for building maps? We can do this without GeoServer; indeed we will install it in the next chapter. For now, you will play with an online map engine and Google Earth to try your understanding of thematic maps concepts.
Point your browser to http://thematicmapping.org/engine/.
Choose a statistical Indicator from the drop-down list, that is, CO2 emissions, then select Year as 2004. Leave all other values as the proposed defaults.
Now try a proportional symbol map. Select Mobile phone subscribers per 100 inhabitants as Indicator and 2006 as Year. Choose Proportional symbol for Technique and Regular polygon as symbol style. Select circle from the drop-down list. Leave the default colors unchanged and select Equal intervals for classification.
You built a couple of thematic maps selecting data, symbol size, and color. You will need to set exactly these parameters in GeoServer to produce beautiful maps. This time we did it without exploring the technical details behind features rendering. In Chapter 6, Styling Your Layers, you will learn how to use SLD (styled layer descriptor) to make thematic maps.
We had a brief but complete introduction to spatial data and maps in this chapter. It was somewhat a theoretical chapter, but we promise you it was the first and last of this kind! From now on, we are going to run real stuff with GeoServer.
Specifically, you learned how an object is referenced to its location, which storage models you can use with spatial data (for example, vector versus raster), and eventually you learned to represent spatial features in a map.
We are now ready to pick up GeoServer, unpack, and install it on your computer.