# Data Around Us

## Gergely Daróczi

September 2015

In this article by Gergely Daróczi, author of the book Mastering Data Analysis with R we will discuss Spatial data, also known as geospatial data, which identifies geographic locations, such as natural or constructed features around us. Although all observations have some spatial content, such as the location of the observation, but this is out of most data analysis tools' range due to the complex nature of spatial information; alternatively, the spatiality might not be that interesting (at first sight) in the given research topic.

On the other hand, analyzing spatial data can reveal some very important underlying structures of the data, and it is well worth spending time visualizing the differences and similarities between close or far data points.

In this article, we are going to help with this and will use a variety of R packages to:

• Retrieve geospatial information from the Internet
• Visualize points and polygons on a map

(For more resources related to this topic, see here.)

# Geocoding

We will use the hflights dataset to demonstrate how one can deal with data bearing spatial information. To this end, let's aggregate our dataset but instead of generating daily data let's view the aggregated characteristics of the airports. For the sake of performance, we will use the data.table package:

``````> library(hflights)
> library(data.table)
> dt <- data.table(hflights)[, list(
+     N         = .N,
+     Cancelled = sum(Cancelled),
+     Distance = Distance[1],
+     TimeVar   = sd(ActualElapsedTime, na.rm = TRUE),
+     ArrDelay = mean(ArrDelay, na.rm = TRUE)) , by = Dest]``````

So we have loaded and then immediately transformed the hlfights dataset to a data.table object. At the same time, we aggregated by the destination of the flights to compute:

• The number of rows
• The number of cancelled flights
• The distance
• The standard deviation of the elapsed time of the flights
• The arithmetic mean of the delays

The resulting R object looks like this:

``````> str(dt)
Classes 'data.table' and 'data.frame': 116 obs. of 6 variables:
\$ Dest     : chr "DFW" "MIA" "SEA" "JFK" ...
\$ N       : int 6653 2463 2615 695 402 6823 4893 5022 6064 ...
\$ Cancelled: int 153 24 4 18 1 40 40 27 33 28 ...
\$ Distance : int 224 964 1874 1428 3904 305 191 140 1379 862 ...
\$ TimeVar : num 10 12.4 16.5 19.2 15.3 ...
\$ ArrDelay : num 5.961 0.649 9.652 9.859 10.927 ...
- attr(*, ".internal.selfref")=<externalptr>``````

So we have 116 observations all around the world and five variables describing those. Although this seems to be a spatial dataset, we have no geospatial identifiers that a computer can understand per se, so let's fetch the geocodes of these airports from the Google Maps API via the ggmap package. First, let's see how it works when we are looking for the geo-coordinates of Houston:

``````> library(ggmap)
> (h <- geocode('Houston, TX'))
lon     lat
1 -95.3698 29.76043``````

So the geocode function can return the matched latitude and longitude of the string we sent to Google. Now let's do the very same thing for all flight destinations:

``> dt[, c('lon', 'lat') := geocode(Dest)]``

Well, this took some time as we had to make 116 separate queries to the Google Maps API. Please note that Google limits you to 2,500 queries a day without authentication, so do not run this on a large dataset. There is a helper function in the package, called geocodeQueryCheck, which can be used to check the remaining number of free queries for the day.

Some of the methods and functions we plan to use in some later sections of this article do not support data.table, so let's fall back to the traditional data.frame format and also print the structure of the current object:

``````> str(setDF(dt))
'data.frame': 116 obs. of 8 variables:
\$ Dest     : chr "DFW" "MIA" "SEA" "JFK" ...
\$ N       : int 6653 2463 2615 695 402 6823 4893 5022 6064 ...
\$ Cancelled: int 153 24 4 18 1 40 40 27 33 28 ...
\$ Distance : int 224 964 1874 1428 3904 305 191 140 1379 862 ...
\$ TimeVar : num 10 12.4 16.5 19.2 15.3 ...
\$ ArrDelay : num 5.961 0.649 9.652 9.859 10.927 ...
\$ lon     : num -97 136.5 -122.3 -73.8 -157.9 ...
\$ lat     : num 32.9 34.7 47.5 40.6 21.3 ...``````

This was pretty quick and easy, wasn't it? Now that we have the longitude and latitude values of all the airports, we can try to show these points on a map.

# Visualizing point data in space

For the first time, let's keep it simple and load some package-bundled polygons as the base map. To this end, we will use the maps package. After loading it, we use the map function to render the polygons of the United States of America, add a title, and then some points for the airports and also for Houston with a slightly modified symbol:

``````> library(maps)
> map('state')
> title('Flight destinations from Houston,TX')
> points(h\$lon, h\$lat, col = 'blue', pch = 13)
> points(dt\$lon, dt\$lat, col = 'red', pch = 19)``````

And showing the airport names on the plot is pretty easy as well: we can use the well-known functions from the base graphics package. Let's pass the three character names as labels to the text function with a slightly increased y value to shift the preceding text the previously rendered data points:

``> text(dt\$lon, dt\$lat + 1, labels = dt\$Dest, cex = 0.7)``

Now we can also specify the color of the points to be rendered. This feature can be used to plot our first meaningful map to highlight the number of flights in 2011 to different parts of the USA:

``````> map('state')
> title('Frequent flight destinations from Houston,TX')
> points(h\$lon, h\$lat, col = 'blue', pch = 13)
> points(dt\$lon, dt\$lat, pch = 19,
+   col = rgb(1, 0, 0, dt\$N / max(dt\$N)))
> legend('bottomright', legend = round(quantile(dt\$N)), pch = 19,
+   col = rgb(1, 0, 0, quantile(dt\$N) / max(dt\$N)), box.col = NA)``````

So the intensity of red shows the number of flights to the given points (airports); the values range from 1 to almost 10,000. Probably it would be more meaningful to compute these values on a state level, as there are many airports, very close to each other, that might be better aggregated at a higher administrative area level. To this end, we load the polygon of the states, match the points of interest (airports) with the overlaying polygons (states), and render the polygons as a thematic map instead of points, like we did on the previous pages.

# Finding polygon overlays of point data

We already have all the data we need to identify the parent state of each airport. The dt dataset includes the geo-coordinates of the locations, and we managed to render the states as polygons with the map function. Actually, this latter function can return the underlying dataset without rendering a plot:

``````> str(map_data <- map('state', plot = FALSE, fill = TRUE))
List of 4
\$ x   : num [1:15599] -87.5 -87.5 -87.5 -87.5 -87.6 ...
\$ y   : num [1:15599] 30.4 30.4 30.4 30.3 30.3 ...
\$ range: num [1:4] -124.7 -67 25.1 49.4
\$ names: chr [1:63] "alabama" "arizona" "arkansas" "california" ...
- attr(*, "class")= chr "map"``````

So we have around 16,000 points describing the boundaries of the US states, but this map data is more detailed than we actually need (see for example the name of the polygons starting with Washington):

``````> grep('^washington', map_data\$names, value = TRUE)
[1] "washington:san juan island" "washington:lopez island"
[3] "washington:orcas island"   "washington:whidbey island"
[5] "washington:main"``````

In short, the non-connecting parts of a state are defined as separate polygons. To this end, let's save a list of the state names without the string after the colon:

``> states <- sapply(strsplit(map_data\$names, ':'), '[[', 1)``

We will use this list as the basis of aggregation from now on. Let's transform this map dataset into another class of object, so that we can use the powerful features of the sp package. We will use the maptools package to do this transformation:

``````> library(maptools)
> us <- map2SpatialPolygons(map_data, IDs = states,
+   proj4string = CRS("+proj=longlat +datum=WGS84"))``````

An alternative way of getting the state polygons might be to directly load those instead of transforming from other data formats as described earlier. To this end, you may find the raster package especially useful to download free map shapefiles from gadm.org via the getData function. Although these maps are way too detailed for such a simple task, you can always simplify those—for example, with the gSimplify function of the rgeos package.

So we have just created an object called us, which includes the polygons of map_data for each state with the given projection. This object can be shown on a map just like we did previously, although you should use the general plot method instead of the map function:

``> plot(us)``

Besides this, however, the sp package supports so many powerful features! For example, it's very easy to identify the overlay polygons of the provided points via the over function. As this function name conflicts with the one found in the grDevices package, it's better to refer to the function along with the namespace using a double colon:

``````> library(sp)
> dtp <- SpatialPointsDataFrame(dt[, c('lon', 'lat')], dt,
+   proj4string = CRS("+proj=longlat +datum=WGS84"))
> str(sp::over(us, dtp))
'data.frame': 49 obs. of 8 variables:
\$ Dest     : chr "BHM" "PHX" "XNA" "LAX" ...
\$ N       : int 2736 5096 1172 6064 164 NA NA 2699 3085 7886 ...
\$ Cancelled: int 39 29 34 33 1 NA NA 35 11 141 ...
\$ Distance : int 562 1009 438 1379 926 NA NA 1208 787 689 ...
\$ TimeVar : num 10.1 13.61 9.47 15.16 13.82 ...
\$ ArrDelay : num 8.696 2.166 6.896 8.321 -0.451 ...
\$ lon     : num -86.8 -112.1 -94.3 -118.4 -107.9 ...
\$ lat     : num 33.6 33.4 36.3 33.9 38.5 ...``````

What happened here? First, we passed the coordinates and the whole dataset to the SpatialPointsDataFrame function, which stored our data as spatial points with the given longitude and latitude values. Next we called the over function to left-join the values of dtp to the US states.

An alternative way of identifying the state of a given airport is to ask for more detailed information from the Google Maps API. By changing the default output argument of the geocode function, we can get all address components for the matched spatial object, which of course includes the state as well. Look for example at the following code snippet:

Based on this, you might want to get a similar output for all airports and filter the list for the short name of the state. The rlist package would be extremely useful in this task, as it offers some very convenient ways of manipulating lists in R.

The only problem here is that we matched only one airport to the states, which is definitely not okay. See for example the fourth column in the earlier output: it shows LAX as the matched airport for California (returned by states[4]), although there are many others there as well.

To overcome this issue, we can do at least two things. First, we can use the returnList argument of the over function to return all matched rows of dtp, and we will then post-process that data:

``````> str(sapply(sp::over(us, dtp, returnList = TRUE),
+   function(x) sum(x\$Cancelled)))
Named int [1:49] 51 44 34 97 23 0 0 35 66 149 ...
- attr(*, "names")= chr [1:49] "alabama" "arizona" "arkansas" ...``````

So we created and called an anonymous function that will sum up the Cancelled values of the data.frame in each element of the list returned by over.

Another, probably cleaner, approach is to redefine dtp to only include the related values and pass a function to over to do the summary:

``````> dtp <- SpatialPointsDataFrame(dt[, c('lon', 'lat')],
+   dt[, 'Cancelled', drop = FALSE],
+   proj4string = CRS("+proj=longlat +datum=WGS84"))
> str(cancels <- sp::over(us, dtp, fn = sum))
'data.frame': 49 obs. of 1 variable:
\$ Cancelled: int 51 44 34 97 23 NA NA 35 66 149 ...``````

Either way, we have a vector to merge back to the US state names:

``> val <- cancels\$Cancelled[match(states, row.names(cancels))]``

And to update all missing values to zero (as the number of cancelled flights in a state without any airport is not missing data, but exactly zero for sure):

``> val[is.na(val)] <- 0``

# Plotting thematic maps

Now we have everything to create our first thematic map. Let's pass the val vector to the previously used map function (or plot it using the us object), specify a plot title, add a blue point for Houston, and then create a legend, which shows the quantiles of the overall number of cancelled flights as a reference:

``````> map("state", col = rgb(1, 0, 0, sqrt(val/max(val))), fill = TRUE)
> title('Number of cancelled flights from Houston to US states')
> points(h\$lon, h\$lat, col = 'blue', pch = 13)
> legend('bottomright', legend = round(quantile(val)),
+   fill = rgb(1, 0, 0, sqrt(quantile(val)/max(val))), box.col = NA)``````

Please note that, instead of a linear scale, we decided to compute the square root of the relative values to define the intensity of the fill color, so that we can visually highlight the differences between the states. This was necessary as most flight cancellations happened in Texas (748), and there were no more than 150 cancelled flights in any other state (with the average being around 45).

You can also easily load ESRI shape files or other geospatial vector data formats into R as points or polygons with a bunch of packages already discussed and a few others as well, such as the maptools, rgdal, dismo, raster, or shapefile packages.

Another, probably easier, way to generate country-level thematic maps, especially choropleth maps, is to load the rworldmap package made by Andy South, and rely on the convenient mapCountryData function.

# Rendering polygons around points

Besides thematic maps, another really useful way of presenting spatial data is to draw artificial polygons around the data points based on the data values. This is especially useful if there is no available polygon shape file to be used to generate a thematic map.

A level plot, contour plot, or isopleths, might be an already familiar design from tourist maps, where the altitude of the mountains is represented by a line drawn around the center of the hill at the very same levels. This is a very smart approach having maps present the height of hills—projecting this third dimension onto a 2-dimensional image.

Now let's try to replicate this design by considering our data points as mountains on the otherwise flat map. We already know the heights and exact geo-coordinates of the geometric centers of these hills (airports); the only challenge here is to draw the actual shape of these objects. In other words:

• Are these mountains connected?
• How steep are the hillsides?
• Should we consider any underlying spatial effects in the data? In other words, can we actually render these as mountains with a 3D shape instead of plotting independent points in space?

If the answer for the last question is positive, then we can start trying to answer the other questions by fine-tuning the plot parameters. For now, let's simply suppose that there is a spatial effect in the underlying data, and it makes sense to visualize the data in such a way. Later, we will have the chance to disprove or support this statement either by analyzing the generated plots, or by building some geo-spatial models—some of these will be discussed later, in the Spatial Statistics section.

## Contour lines

First, let's expand our data points into a matrix with the fields package. The size of the resulting R object is defined arbitrarily but, for the given number of rows and columns, which should be a lot higher to generate higher resolution images, 256 is a good start:

``````> library(fields)
> out <- as.image(dt\$ArrDelay, x = dt[, c('lon', 'lat')],
+   nrow = 256, ncol = 256)``````

The as.image function generates a special R object, which in short includes a 3‑dimensional matrix-like data structure, where the x and y axes represent the longitude and latitude ranges of the original data respectively. To simplify this even more, we have a matrix with 256 rows and 256 columns, where each of those represents a discrete value evenly distributed between the lowest and highest values of the latitude and longitude. And on the z axis, we have the ArrDelay values—which are in most cases of course missing:

``````> table(is.na(out\$z))
FALSE TRUE
112 65424``````

What does this matrix look like? It's better to see what we have at the moment:

``> image(out)``

Well, this does not seem to be useful at all. What is shown there? We rendered the x and y dimensions of the matrix with z colors here, and most tiles of this map are empty due to the high amount of missing values in z. Also, it's pretty straightforward now that the dataset included many airports outside the USA as well. How does it look if we focus only on the USA?

``````> image(out, xlim = base::range(map_data\$x, na.rm = TRUE),
+           ylim = base::range(map_data\$y, na.rm = TRUE))``````

An alternative and more elegant approach to rendering only the US part of the matrix would be to drop the non-US airports from the database before actually creating the out R object. Although we will continue with this example for didactic purposes, with real data make sure you concentrate on the target subset of your data instead of trying to smooth and model unrelated data points as well.

A lot better! So we have our data points as a tile, now let's try to identify the slope of these mountain peaks, to be able to render them on a future map. This can be done by smoothing the matrix:

``````> look <- image.smooth(out, theta = .5)
> table(is.na(look\$z))
FALSE TRUE
14470 51066``````

As can be seen in the preceding table, this algorithm successfully eliminated many missing values from the matrix. The image.smooth function basically reused our initial data point values in the neighboring tiles, and computed some kind of average for the conflicting overrides. This smoothing algorithm results in the following arbitrary map, which does not respect any political or geographical boundaries:

``> image(look)``

It would be really nice to plot these artificial polygons along with the administrative boundaries, so let's clear out all cells that do not belong to the territory of the USA. We will use the point.in.polygon function from the sp package to do so:

``````> usa_data <- map('usa', plot = FALSE, region = 'main')
> p <- expand.grid(look\$x, look\$y)
> library(sp)
> n <- which(point.in.polygon(p\$Var1, p\$Var2,
+ usa_data\$x, usa_data\$y) == 0)
> look\$z[n] <- NA``````

In a nutshell, we have loaded the main polygon of the USA without any sub-administrative areas, and verified our cells in the look object, if those are overlapping the polygon. Then we simply reset the value of the cell, if not.

The next step is to render the boundaries of the USA, plot our smoothed contour plot, then add some eye-candy in the means of the US states and, the main point of interest, the airport:

``````> map("usa")
> map("state", lwd = 3, add = TRUE)
> title('Arrival delays of flights from Houston')
> points(dt\$lon, dt\$lat, pch = 19, cex = .5)
> points(h\$lon, h\$lat, pch = 13)``````

Now this is pretty neat, isn't it?

## Voronoi diagrams

An alternative way of visualizing point data with polygons is to generate Voronoi cells between them. In short, the Voronoi map partitions the space into regions around the data points by aligning all parts of the map to one of the regions to minimize the distance from the central data points. This is extremely easy to interpret, and also to implement in R. The deldir package provides a function with the very same name for Delaunay triangulation:

``````> library(deldir)
> map("usa")
> plot(deldir(dt\$lon, dt\$lat), wlines = "tess", lwd = 2,
+   pch = 19, col = c('red', 'darkgray'), add = TRUE)``````

Here, we represented the airports with red dots, as we did before, but also added the Dirichlet tessellation (Voronoi cells) rendered as dark-gray dashed lines. For more options on how to fine-tune the results, see the plot.deldir method.

In the next section, let's see how to improve this plot by adding a more detailed background map to it.

# Satellite maps

There are many R packages on CRAN that can fetch data from Google Maps, Stamen, Bing, or OpenStreetMap—even some of the packages we previously used in this article, like the ggmap package, can do this. Similarly, the dismo package also comes with both geo-coding and Google Maps API integration capabilities, and there are some other packages focused on that latter, such as the RgoogleMaps package.

Now we will use the OpenStreetMap package, mainly because it supports not only the awesome OpenStreetMap database back-end, but also a bunch of other formats as well. For example, we can render really nice terrain maps via Stamen:

``````> library(OpenStreetMap)
> map <- openmap(c(max(map_data\$y, na.rm = TRUE),
+                 min(map_data\$x, na.rm = TRUE)),
+               c(min(map_data\$y, na.rm = TRUE),
+                 max(map_data\$x, na.rm = TRUE)),
+               type = 'stamen-terrain')``````

So we defined the left upper and right lower corners of the map we need, and also specified the map style to be a satellite map. As the data by default arrives from the remote servers with the Mercator projections, we first have to transform that to WGS84 (we used this previously), so that we can render the points and polygons on the top of the fetched map:

``````> map <- openproj(map,
+   projection = '+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs')``````

And Showtime at last:

``````> plot(map)
> plot(deldir(dt\$lon, dt\$lat), wlines = "tess", lwd = 2,
+   col = c('red', 'black'), pch = 19, cex = 0.5, add = TRUE)``````

This seems to be a lot better compared to the outline map we created previously. Now you can try some other map styles as well, such as mapquest-aerial, or some of the really nice-looking cloudMade designs.

# Interactive maps

Besides being able to use Web-services to download map tiles for the background of the maps created in R, we can also rely on some of those to generate truly interactive maps. One of the best known related services is the Google Visualization API, which provides a platform for hosting visualizations made by the community; you can also use it to share maps you've created with others.

In R, you can access this API via the googleVis package written and maintained by Markus Gesmann and Diego de Castillo. Most functions of the package generate HTML and JavaScript code that we can directly view in a Web browser as an SVG object with the base plot function; alternatively, we can integrate them in a Web page, for example via the IFRAME HTML tag.

The gvisIntensityMap function takes a data.frame with country ISO or USA state codes and the actual data to create a simple intensity map. We will use the cancels dataset we created in the Finding Polygon Overlays of Point Data section but, before that, we have to do some data transformations. Let's add the state name as a new column to the data.frame, and replace the missing values with zero:

``````> cancels\$state <- rownames(cancels)
> cancels\$Cancelled[is.na(cancels\$Cancelled)] <- 0``````

Now it's time to load the package and pass the data along with a few extra parameters, signifying that we want to generate a state-level US map:

``````> library(googleVis)
> plot(gvisGeoChart(cancels, 'state', 'Cancelled',
+                   options = list(
+                       region     = 'US',
+                       displayMode = 'regions',
+                       resolution = 'provinces')))``````

The package also offers opportunities to query the Google Map API via the gvisMap function. We will use this feature to render the airports from the dt dataset as points on a Google Map with an auto-generated tooltip of the variables.

But first, as usual, we have to do some data transformations again. The location argument of the gvisMap function takes the latitude and longitude values separated by a colon:

``> dt\$LatLong <- paste(dt\$lat, dt\$lon, sep = ':')``

We also have to generate the tooltips as a new variable, which can be done easily with an apply call. We will concatenate the variable names and actual values separated by a HTML line break:

``````> dt\$tip <- apply(dt, 1, function(x)
+                 paste(names(dt), x, collapse = '<br/ >'))``````

And now we just pass these arguments to the function for an instant interactive map:

``> plot(gvisMap(dt, 'LatLong', tipvar = 'tip'))``

Another nifty feature of the googleVis package is that you can easily merge the different visualizations into one by using the gvisMerge function. The use of this function is quite simple: specify any two gvis objects you want to merge, and also whether they are to be placed horizontally or vertically.

## JavaScript mapping libraries

The great success of the trending JavaScript data visualization libraries is only partly due to their great design. I suspect other factors also contribute to the general spread of such tools: it's very easy to create and deploy full-blown data models, especially since the release and on-going development of Mike Bostock's D3.js.

Although there are also many really useful and smart R packages to interact directly with D3 and topojson (see for example my R user activity compilation at http://bit.ly/countRies). Now we will only focus on how to use Leaflet— probably the most used JavaScript library for interactive maps.

What I truly love in R is that there are many packages wrapping other tools, so that R users can rely on only one programming language, and we can easily use C++ programs and Hadoop MapReduce jobs or build JavaScript-powered dashboards without actually knowing anything about the underlying technology. This is especially true when it comes to Leaflet!

There are at least two very nice packages that can generate a Leaflet plot from the R console, without a single line of JavaScript. The Leaflet reference class of the rCharts package was developed by Ramnath Vaidyanathan, and includes some methods to create a new object, set the viewport and zoom level, add some points or polygons to the map, and then render or print the generated HTML and JavaScript code to the console or to a file.

Unfortunately, this package is not on CRAN yet, so you have to install it from GitHub:

``> devtools::install_github('ramnathv/rCharts')``

As a quick example, let's generate a Leaflet map of the airports with some tooltips, like we did with the Google Maps API in the previous section. As the setView method expects numeric geo-coordinates as the center of the map, we will use Kansas City's airport as a reference:

``````> library(rCharts)
> map <- Leaflet\$new()
> map\$setView(as.numeric(dt[which(dt\$Dest == 'MCI'),
+   c('lat', 'lon')]), zoom = 4)
> for (i in 1:nrow(dt))
+     map\$marker(c(dt\$lat[i], dt\$lon[i]), bindPopup = dt\$tip[i])
> map\$show()``````

Similarly, RStudio's leaflet package and the more general htmlwidgets package also provide some easy ways to generate JavaScript-powered data visualizations. Let's load the library and define the steps one by one using the pipe operator from the magrittr package, which is pretty standard for all packages created or inspired by RStudio or Hadley Wickham:

``````> library(leaflet)
> leaflet(us) %>%
+   addMarkers(lng = dt\$lon, lat = dt\$lat, popup = dt\$tip)``````

I especially like this latter map, as we can load a third-party satellite map in the background, then render the states as polygons; we also added the original data points along with some useful tooltips on the very same map with literally a one-line R command. We could even color the state polygons based on the aggregated results we computed in the previous sections! Ever tried to do the same in Java?

# Alternative map designs

Besides being able to use some third-party tools, another main reason why I tend to use R for all my data analysis tasks is that R is extremely powerful in creating custom data exploration, visualization, and modeling designs.

As an example, let's create a flow-map based on our data, where we will highlight the flights from Houston based on the number of actual and cancelled flights. We will use lines and circles to render these two variables on a 2-dimensional map, and we will also add a contour plot in the background based on the average time delay.

But, as usual, let's do some data transformations first! To keep the number of flows at a minimal level, let's get rid of the airports outside the USA at last:

``````> dt <- dt[point.in.polygon(dt\$lon, dt\$lat,
+                           usa_data\$x, usa_data\$y) == 1, ]``````

We will need the diagram package (to render curved arrows from Houston to the destination airports) and the scales package to create transparent colors:

``````> library(diagram)
> library(scales)``````

Then let's render the contour map described in the Contour Lines section:

``````> map("usa")
> title('Number of flights, cancellations and delays from Houston')
> map("state", lwd = 3, add = TRUE)``````

And then add a curved line from Houston to each of the destination airports, where the width of the line represents the number of cancelled flights and the diameter of the target circles shows the number of actual flights:

``````> for (i in 1:nrow(dt)) {
+   curvedarrow(
+     from       = rev(as.numeric(h)),
+     to         = as.numeric(dt[i, c('lon', 'lat')]),
+     arr.pos   = 1,
+     arr.type   = 'circle',
+     curve     = 0.1,
+     arr.col   = alpha('black', dt\$N[i] / max(dt\$N)),
+     arr.length = dt\$N[i] / max(dt\$N),
+     lwd       = dt\$Cancelled[i] / max(dt\$Cancelled) * 25,
+     lcol       = alpha('black',
+                    dt\$Cancelled[i] / max(dt\$Cancelled)))
+ }``````

Well, this article ended up being about visualizing spatial data, and not really about analyzing spatial data by fitting models and filtering raw data.

# Summary

In case you are interested in knowing other R-related books that Packt has in store for you, here is the link:

# Resources for Article:

Further resources on this subject:

You've been reading an excerpt of: