This chapter contains the following recipes:
Plotting a function
Plotting multiple curves
Using two different y-axes
Making a scatterplot
Drawing filled curves
Handling financial data
Making a basic histogram plot
Plotting multiple histograms
Dealing with errors
Making a statistical whisker plot
Making an impulse plot
Graphing parametric curves
Plotting with polar coordinates
We begin the book with a set of recipes that cover gnuplot's one-dimensional graph styles. A 1D graph refers to the plotting of data or mathematical functions where the values plotted depend on a single variable. Examples are simple mathematical functions, such as y = sin(x), or 1D data, such as the temperature in a particular location versus time. The plotting of quantities that depend on two variables is covered starting in Chapter 8, The Third Dimension, where we show how to make surface, contour, and image plots.
Gnuplot can create a vast array of 1D plot types in a large number of styles. The recipes in this chapter survey all of the major types of 1D graph, with an example that can be run immediately to produce the result in the illustration. For each example, we have provided enough explanation in the There's more... section for you to extend and adapt the recipe for your particular problem. We assume that you have gnuplot up and running and are able to create plots on one of the terminals; the recipes in this chapter work on every terminal or output file type.
gnuplot can be used as a tool to interactively explore the structure of mathematical functions, as well as to create illustrations for publication or education. It has built-in knowledge of both elementary functions, such as sine and cosine, and some special functions, such as Bessel functions and elliptic integrals. The following figure shows the plotting of the
Start up an interactive gnuplot session and make sure that your graphic terminal of choice is selected, and working, using the
set term command (for example, at the console you simply type
gnuplot, and, to change the default terminal to X Windows, type
set term x11).
Gnuplot understands a big handful of mathematical functions, listed in Section 13.1 of the official manual (the official gnuplot documentation can be found at gnuplot's home, http://gnuplot.info/). It also understands all the basic mathematical operators, with a syntax similar to Fortran or C, so you can combine functions into expressions, as shown in the following command:
plot [-5:5] (sin(1/x) - cos(x))*erfc(x)
In the previous command, we have also shown how to use the [a:b] notation to limit the plot to a specified range on the x-axis.
You will often want to plot more than one curve on a single graph, all sharing the same axes. This is simple in gnuplot: just separate the functions or datafiles by commas, and gnuplot will plot them in a sequence of colors or curve styles, with a legend so you can identify them. The following figure shows the plotting of multiple curves:
It will be useful to have some datafiles on your disk for use with some of the plotting recipes. You could make them by hand with a text editor or write a program in your favorite language to generate them, but gnuplot can do this itself. To make a file with data that forms a parabola flipped upside down, tell gnuplot to
set table 'parabola.text'. Make sure to include the quotes around the filename. Then say
plot -x**2. This writes a table out to the file
parabola.text rather than making a picture. Now, say
unset table. You should have a file called
parabola.text in the directory in which you started gnuplot. Keep it around so we can use it later.
After setting your terminal back to the graphics device you want to use at the gnuplot console, type the following command:
plot [-1:1] 'parabola.text', -x, -x**3
Gnuplot plots the curves using three different colors, dash styles, or line thicknesses, depending on the terminal in use, with a legend so you can tell them apart. The functions are plotted as smooth curves, as we did earlier, and the data from the file is plotted as a series of points, by default; one for each point in the range. This can all be adjusted, as we shall see in Chapter 3, Applying Colors and Styles.
Take a look at the datafile that gnuplot created to see the format it understands. After several comment lines beginning with the "#" character, we find a series of x coordinates and y values. The last character on each of these lines is a letter: "i" if the point is in the active range, "o" if it is out of range, or "u" if it is undefined.
Sometimes our curves can or should not share the same y-axis. Gnuplot handles this with its tics commands, which we cover in greater detail in Chapter 4, Controlling your Tics. The following figure is a plot of two functions covering very different ranges; if the two curves were plotted against the same y-axis, one would be too small to see:
The following simple three-line script will create the previous figure:
set y2tics -100, 10 set ytics nomirror plot sin(1/x) axis x1y1,100*cos(x) axis x1y2
Gnuplot can have two different y-axes and two different x-axes. In order to define a second y-axis, use the
y2tics command; the first parameter is the starting value at the bottom of the graph, and the second is the interval between tics on the axis. The command
set ytics nomirror tells gnuplot to use a different axis on the right-hand side, rather than simply mirroring the left-hand y-axis. The final plot command is similar to the ones we've seen before, with the addition of the "axis" commands; these tell gnuplot which set of axes to use for which curve.
One of our functions,
sin(1/x), oscillates infinitely quickly near x = 0. Experiment with issuing the command
set samples N before the
plot command to see how more information is plotted near the singularity at the origin if you use larger values of
set x2tics -20 2 set xtics nomirror set xrange [-10:10] set x2range [-20:0] plot sin(1/x) axis x1y1, 100*cos(x-1) axis x2y2
The previous script creates a plot that sets different scales on the top and bottom axes as well as on left and right axes; it uses the axis command in the last line to specify against which axes the curves are plotted.
One problem with the graphs in this recipe is that, although there is a legend generated automatically to show which curve is a plot of which function, there is nothing to show us which curve is plotted against which axis. In Chapter 2, Annotating with Labels and Legends, you will see how to put informative labels and arrows on your plots to address this.
If you are in possession of a collection of measurements that, as is usually the case, is subject to random errors, an attempt to simply plot a curve through the measurements may result in a chaotic graph that will be difficult to interpret. In these cases, one usually begins with a scatterplot, which is simply a plot of a dot or small symbol at each data point. An examination of such a plot often leads to the discovery of correlations or patterns.
To make this recipe interesting, we need some slightly random-looking data. You may have some available, in which case you merely need to ensure that it is in a format that gnuplot can read. Simply arrange the data so that each line of the file contains one data point with space-separated x and y values:
x1 y1 x2 y2 ...
Then name the file
If you don't have such a file of your own handy, use the one called
scatter.dat that we have provided. Make sure that the file is in the directory in which you have started gnuplot, so that the program can find it.
Some of the recipes in this book will not work as intended if entered in the same interactive session unless you give the
reset command first. This is because these scripts make settings that change gnuplot's default behavior.
Now simply tell gnuplot:
plot 'scatter.dat' with points pt 7
If you are using the file we provided, you will get a plot similar to the one shown in the previous figure.
You can plot the points using different symbols. Try
plot 'scatter.dat' with dots to get the smallest dot available to your terminal. For use with scatterplots of very large datasets, try the following command:
plot 'scatter.dat' with points pt n
With different integers for
pt stands for pointtype, and the different pointtypes available are dependent on your terminal. Simply type
test in gnuplot to see a demonstration of all the pointtypes available for the currently selected terminal. You can find more about point and line styles in Chapter 3, Applying Colors and Styles.
Gnuplot's box style is similar to a bar chart, with each value plotted as a box extending up from the axis. You can have the boxes filled with patterns, solid colors, or leave them empty.
This style is commonly used either as a type of histogram (covered later in this chapter) or as a way to compare a set of disparate items. The following figure plots boxes using the fill pattern:
It just takes the following script to get the previous figure:
set style fill pattern plot [-6:6] besj0(x) with boxes, sin(x) with boxes
The first command tells gnuplot to fill the boxes with a fill pattern, cycling through the patterns available on the selected output device for each plot on the graph. The second command plots the two specified functions using the boxes style, which draws a box from the x-axis to the y value for each point.
This recipe will introduce gnuplot's ability to place objects at locations specified in a datafile or by mathematical functions, and to define their properties dynamically to convey information about the data. The following figure shows how gnuplot plots circles:
We have provided a datafile called
parabolaCircles.text, which is similar to the
parabola.text file that we created previously with gnuplot's help, but with a third column that consists of some random numbers. Make sure this file is in your current directory so that gnuplot can find it. Alternatively, use any datafile you like with three columns.
Enter the following script to make a circle plot:
set key off plot "parabolaCircles.text" with circles
For each point in the datafile, we get a circle with a radius determined by the number in the third column. Here the radii are random, but in practice you can encode some value of interest in the radii, in effect providing a way to plot two values for each point on the x-axis.
For example, the y coordinate can represent a measurement and the radii can indicate the uncertainty in the measurement; or we can get meteorological data and can plot temperature versus time, with the circle radius representing humidity.
The first line in the script turns off the legend that otherwise gnuplot adds by default.
When you want to highlight the difference between two curves or datasets, or show when your data values exceed some reference value, the filled curve style, with some encouragement, can be made to serve. The following figure shows an example of filled curves:
For the main recipe, you should be ready to go. If you want to try the commands for creating the second plot in this section, as shown in the next figure, you need another datafile
intersection, which we have provided. This consists of the numerical output of a program that simply calculated the coordinates of a straight line and a parabola. You can substitute your own data, as long as it is the format described in the following There's more... section.
The following command creates the previous figure:
plot [0:50] besy0(x) with filledcurves above y1=0.07
This simple use of the filledcurves style colors in the area showing when the plotted Bessel function exceeds
0.07. We let gnuplot use the default color and shading style.
You can change the color (to blue, for example) by appending
lt rgb blue to the plotting command. If you want to change the fill style to use a pattern rather than a solid color, precede the plotting command with the following command:
set style fill pattern n
In this command
n is an integer that specifies the fill style from those available in your terminal. To see a list of these, just issue the command
Suppose you are plotting data from a file, and the data is arranged in a table in the following format:
x1 y1 z1 x2 y2 z2 x3 y3 z3 ...
You can fill in the difference between the two curves y versus x and z versus x with the following command:
plot <'file'> using 1:2:3 with filledcurves
Following is the complete script for creating the plot shown in the previous figure:
set style fill pattern 5 plot 'intersection' using 1:2:3 with filledcurves,\'' using 1:2 lw 3 notitle, '' using 1:3 lw 3 notitle
This plot shows the difference between a parabola and a straight line.
The fill pattern is similar to what you will get with the X11, Postscript, and some other terminals, but, as with all patterns and styles selected by an index number, this is dependent on the terminal. On the Macintosh using the Aquaterm terminal, for example, all the fill patterns are solid colors, and selecting an index merely changes the color.
The plot command used here exploits some features that we have not covered before. The
using keyword selects the column from the datafile;
using 1:2:3 means to plot all columns, and the
filledcurves style knows how to fill in the difference between the curves in this case. After this, we plot the parabola and line separately, using a blank filename to select the previous name. The purpose of the last two plot components of the command is to plot the thick lines that delimit the filled area; the
lw 3 chooses the line thickness, and the
notitle tells gnuplot not to add an entry into the legend for these plot components, which would be redundant.
But what if you want to make something similar to the previous figure without making an intermediate datafile? You can make a plot that fills the area between two functions by using gnuplot's special filenames. This is a facility that allows you to do things that normally can only be done with datafiles right on the command line or in a script, without having to make a datafile.
Following is another way to get the previous figure:
set style fill pattern 5 plot [0:1] '+' using 1:(-$1):(-$1**2) with filledcurves,\-x lw 3 notitle, -x**2 lw 3 notitle
+ refers to a fictitious datafile where the first column consists of the automatically calculated sample points.
We've already encountered another of gnuplot's special files, the file called
'' (an empty string), which refers to the previously named datafile, and we used it to avoid having to type its name multiple times.
Although gnuplot was originally envisioned as a scientist's companion, it has proven to be a worthy and reliable friend to financial analysts. Financial plotting comes with its own set of complex problems, some of which we'll have to defer to later chapters; in the following figure, we illustrate the basic financial plotting style:
This type of plot will be familiar to you if you follow the stock market.
Sample financial data is essential for illustrating financial plotting. Fortunately, the gnuplot distribution comes with an appropriate sample datafile. In case you don't have it, we have provided a copy called
finance.dat. Make sure it's in your current directory so that gnuplot can find it. You are welcome, of course, to use your own data, but it must be in the correct format. Each line of the file represents a separate data point, and consists of (at least) five numbers, separated by spaces:
date open low high close.
An example of a line from such a datafile would look similar to the following:
3/11/2011 76.15 76.63 75.2 75.35
Enter the following commands while you are in the directory containing the datafile:
set bars 2 plot [0:100] 'finance.dat' using 0:2:3:4:5 notitle with financebars
This makes the conventional financial graph showing the high, low, open, and close prices for a stock. If you are reading this recipe, you no doubt already know why you want this type of plot.
The default size of the tics for the opening and closing prices is quite small; the first command makes it longer. The second command sets the range, chooses the file, and specifies the columns to use for the finance plot.
This recipe shows you how to make the simplest step-type histogram. Later, we will build histogram and statistical plots on this, but sometimes this is all you need. The following figure shows a simple step-type histogram:
We're going to plot a part of our file
parabola.text, so make sure that's still available. Of course, if you have your own sorted statistical data that will probably be more interesting.
Type the following command to make a histogram plot:
plot [-2:2] 'parabola.text' with histeps
As we can see, rather than drawing a line through a series of x-y points, the histeps style draws a staircase composed of horizontal and vertical line segments. The vertical lines are drawn not at the actual x-coordinates given in the data, but at the average values of neighboring x-coordinates. This is the usual way to construct a histogram, where each box represents "how much" is contained in each interval between two x-values.
A more interesting type of histogram plot shows the distribution of some quantity with a second distribution stacked on top. This provides a quick way to visually compare two distributions. The values of the second distribution are measured not from the axis, but from the top of the box showing the first distribution. The following figure shows a stacking histogram:
You might have noticed that the information printed in the legend on the upper-right corner is not very descriptive. This is the default; in the next chapter, you will learn how to change it to whatever you want.
The script that produced the stacked histogram is as follows:
set style fill solid 1.0 border lt -1 set style data histograms set style histogram rowstacked plot [0:40] 'parabolaCircles.text' using (-$2),\'' using (20*$3) notitle
The first line requests histogram bars filled with a solid color, and with a black border. Without this, the bars are plotted unfilled, which makes the plot more difficult to interpret.
The next two lines specify that data from files should be plotted using histograms; the
rowstacked style means that data from each row in the file will be plotted together in one vertical stack.
In the last line, we have chosen to illustrate how to do simple calculations on data columns; the expression is enclosed in parentheses, the column number is preceded with a dollar sign, and the familiar Fortran or C type syntax works just the way you would expect. So we have flipped our parabola back "right side up" with a negative sign, and increased the magnitude of our random numbers by multiplying by 20. (This file was used to plot circles with random diameters in the Plotting circles recipe in this chapter. The random numbers were scaled to give appropriately sized circles, but are too small to give a good illustration of the stacked histogram here. Rather than generating new data, some simple arithmetic allows us to reuse the file.)
Rather than stacking the histograms, you can plot them side by side. The following figure shows the same data as in the previous plot, but has two separate sets of histograms plotted beside each other:
To make room, the histogram boxes are automatically drawn thinner. The different data sets are distinguished by different fill colors or patterns, depending on terminal, and/or different styles for the lines delineating the histogram boxes.
Following are the commands used to produce a multiple histogram plot:
set style fill solid 1.0 border lt -1 set style data histograms plot [0:40] 'parabolaCircles.text' using (-$2),\'' using (20*$3) notitle
Along with data often comes error, uncertainty, or the general concept of a range of values associated with each plotted value. To express this in a plot, various conventions can be used; one of these is the "error bar", for which gnuplot has some special styles. The following figure shows an example of an error bar:
The previous figure has the same data that we used in our previous recipe, Plotting circles, plotted over a restricted range, and using the random number column to supply "errors", which are depicted here as vertical lines with small horizontal caps.
Following is the script for producing a basic data set plot with
set pointsize 3 set bars 3 plot [1:3] 'parabolaCircles.text' using 1:(-$2):3 with errorbars,\'' using 1:(-$2):3 pt 7 notitle
We are using our trusty parabola plus the random number file again; here the random numbers will stand in for errors.
The default point size in gnuplot is quite small; the first line in the recipe increases this. This is especially important for presentations, where increasing the size of various plot elements will make your projected slides far easier to see. The second line increases the size of the small horizontal bars on the ends of the error bars; the default is rather small and hard to see. The third line selects the range, flips the parabola as before, and selects the error bars style. If we omit the portion after the comma, the error bars alone are plotted, with another small horizontal bar indicating the data values. This is OK, but the graph is easier to interpret if we plot a more distinct symbol at each data point; that's what the component after the comma does. We use the special file designator
'' to mean the file already mentioned;
pt is short for point type, and
pt 7 gives a solid circle on most terminals. Finally,
notitle prevents a second, redundant entry in the legend.
Error bars can be combined with some of the other plot styles. To create the following figure, which combines a box plot with error bars, change the last line in the recipe to the following commands:
set style fill pattern 2 border lt -1 plot [1:3] 'parabolaCircles.text' using 1:(-$2):3 with boxerrorbars
We've just changed
boxerrorbars, but first we set the fill pattern to a fine hatching pattern, (this will depend on your output device, try the command
test to see them) and asked for a black border to be drawn around the boxes.
This is the same data plotted in the previous figure, in a different style.
Also known in the statistics world as a "box and whisker plot" or simply as a boxplot, the statistical whisker plot is a series of symbols, each one showing the mean value of a set of measurements, the extent of the central part of the measurements' or population's distribution, and the extent of the remainder of the distribution excluding the "outliers" (the outliers themselves are sometimes shown as dots, but we won't use that style here). This type of plot is also sometimes used for financial price data rather than the finance plot that was the subject of the Handling financial data recipe in this chapter. We will avoid the specialized language of statistics and further discussion of the uses of these plots, but the statisticians among our readers know why they're here. The following figure shows the depiction of a statistical whisker plot using gnuplot:
In the previous plot, typically, the boxes show the range of the central part of the data distribution; the short horizontal line within the boxes shows the value of the mean; and the vertical lines extending above and below the boxes show the range of the bulk of the distribution excluding the outliers.
We've borrowed the demo file
candlesticks.dat that comes with the gnuplot distribution; make sure it's in your current directory. If you want to use your own data instead, each line of the file must be in the following format:
x whisker_min box_min mean box_high whisker_high
Feed the following script to gnuplot to get the whisker plot:
set xrange [0:11] set yrange [0:10] set boxwidth 0.2 plot 'candlesticks.dat' using 1:3:2:6:5 with candlesticks lt -1 lw 2 whiskerbars,\ '' using 1:4:4:4:4 with candlesticks lt -1 lw 2 notitle
The first two lines set the x and y ranges of the axes; they are set to give a little room around the data. The next line sets the
boxwidth—the width of the rectangle showing the extent of the central part of the distribution (the default is very skinny). Next comes the plot command, split here over two lines. The order of the fields expected by the candlestick style is
box_high, which is not in the same order as our datafile, so we need to use the
using command to put the columns in the right order for plotting. The first plot command also specifies the line type
lt to be
-1 for solid black and a line width is set to
2; whiskerbars means put the little caps on the end of the whiskers. The second plot command—starting on the last line—plots from the same datafile, but employs a trick to use the 4th column—containing the mean value—repeatedly, effectively collapsing the box ends and whiskers down to the mean, all just to plot the little horizontal line in the middle of the boxes. This may seem like a convoluted method, but it ensures that the lines indicating the mean values are in the right places and have exactly the correct width to lie within the boxes.
Impulse or stick plots are another way to represent discrete points. If the line thickness is made large, the impulse plot can be made to look like a bar chart.
The following script illustrates the use of the
set samples 30 plot [0:2*pi] sin(x) with impulses lw 2
The first command set the number of points used to sample or plot the function. The plot command tells gnuplot to use the impulse style, which draws a line from the x-axis to each y value; the thickness of the line is given by
A "stem plot" is sometimes used in electrical engineering. It is similar to the impulse plot, but with a mark at the end of each stick; this allows the eye to more easily follow the trend of the data; conversely, the sticks make it easier to read the graph, especially when the data is sparse, compared with a simple point plot. Use the following recipe to create a stem plot of a decaying sine wave, illustrated in the following figure:
set samples 50 plot [0:4*pi] exp(-x/4.)*sin(x) with impulses lw 2 notitle,\exp(-x/4.)*sin(x) with points pt 7
As you can see, we have plotted the same function twice. The first time through plot the impulses, as in the previous script, and the second time we plot the function again
with points to draw the dots.
The previous plot shows a typical exponentially damped sine wave; it represents, for example, the motion of a pendulum with friction.
Gnuplot can graph functions whose x and y values depend on a third variable, called a parameter. In this way, more complicated curves can be drawn. The following plot resembles a lissajous figure, which can be seen on an oscilloscope when sine waves of different frequencies are controlling the x and y axes:
The following script creates the previous figure:
set samples 1000 set parametric plot sin(7*t), cos(11*t) notitle
We want more samples than the default 100 for a smoother plot, hence the first line. The second line (highlighted) changes the way gnuplot interprets plot commands; now the two functions (in the third line) are understood to provide x and y coordinates in the plane as the parameter
t is varied. Once we say
set parametric, then we can say
plot x(t), y(t), and the plot will trace out a curve given by
t is varied between the limits given in trange.
All the plots in this chapter up to now have implicitly used rectangular coordinates, usually denoted as x and y. For certain types of information, however, polar geometry is the natural coordinate system. In polar coordinates we have a radius,
r, measured from the origin, usually at the center of the graph, and an angle,
θ, usually measured counter-clockwise from the horizontal. On the gnuplot command line, the angular coordinate is called
t by default. The following is an example of a spiral illustration:
Using polar coordinates we can plot spirals and closed curves that are impossible to define explicitly using rectangular coordinates.
Following is an example of how to use polar coordinates to get the spiral shown in the previous illustration:
set xtics axis nomirror set ytics axis nomirror set zeroaxis unset border set samples 500 set polar plot [0:12*pi] t
The first three lines create a pair of axes that intersect at the origin in the center of the graph. This works for polar plots too, where we are measuring the radius from the center. The
unset border line removes the frame that has served up to now as axes for our rectangular coordinate plots. Next, we increase the number of samples for a smooth plot. The crucial, highlighted line
set polar changes to polar (r-θ) coordinates from the default rectangular (x-y). In the plot command,
t is now a dummy variable that passes through the given angular range (default [0:2*pi], changed to [0:12*pi] here), and the provided function (
r) is a function of
t, in this case the identity, that yields a circular spiral.