Minitab has been a statistical package of choice across all numerous sectors of industry including education and finance. Correctly using Minitab's statistical tools is an essential part of good decision making and allows you to achieve your targeted results, while displaying fantastic charts and a powerful analysis will also communicate your results more effectively.

"Minitab Cookbook" will take the mystery out of using Minitab and will simplify the steps to produce great results. This book will be hugely beneficial for anyone who knows what statistics or studies they want to run, but who is unsure about just what button to press or what option to select.

In this book, you will learn how to use data from different sources and will be guided through the basics of graphs as well as the basics of hypothesis tests. You will explore the use of non-linear regression, how to construct complex ANOVAs, and even delve into Multivariate statistics.

"Minitab Cookbook" is a great reference on how to create graphs, generate P-values, and how to put data in order. You will explore the basics of charts as well as into the complex depths of Factor analysis, and you will learn everything from the simplest of t-tests to the complexity of mixed model ANOVA. And finally, for all of you who want to write Macros, this book covers the use of Execs, Global, and Local Macros.

Publication date:
February 2014
Publisher
Packt
Pages
338
ISBN
9781782170921

## Chapter 1. Worksheet, Data Management, and the Calculator

In this chapter, we'll be covering the following recipes:

• Opening an Excel file in Minitab

• Opening data from Access using ODBC

• Stacking several columns together

• Stacking blocks of columns at the same time

• Transposing the columns of a worksheet

• Splitting a worksheet by categorical column

• Creating a subset of data in a new worksheet

• Extracting values from a date/time column

• Calculator – basic functions

• Calculator – using an if statement

• Coding a numeric column to text values

• Cleaning up a text column with the calculator

## Introduction

This first chapter illustrates the use of a worksheet, data menu, and calculator tools. We start by bringing data in from other applications and then working on it using other data tools within the calculator menu. The emphasis is on getting data into Minitab and reformatting into a preferred structure for later studies.

As you can see in the preceding screenshot, Minitab prefers working in columns. This can be structurally a different approach to Excel, where the data is held in cells. We will start by bringing our data in from Excel and Access, and then move into reformatting the data. It should be noted that copying and pasting into Minitab can be a perfectly acceptable method of moving data from one application to another.

## Opening an Excel file in Minitab

In this task, we will open a set of data in an Excel file. This can be either in `.xls` or `.xlsx` format. We will set options to help with reading the data in the Excel file as the correct type of data.

Preparation for this task is very simple. We only need a set of data saved in an Excel workbook. Any worksheet is suitable, but be aware that the Open Worksheet… command will open every worksheet in an Excel workbook at the same time. The formatting options that we will use here are applied across the entire workbook and cannot have separate format options for every worksheet. Minitab worksheets have a maximum limit of 4000 columns and a practical limit of 10 million rows.

We follow an example here using the `Pulse workbook.xlx` worksheet.

### Tip

You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

### How to do it…

The following instructions detail the steps for importing data from Excel by using the open worksheet command:

1. Within Minitab, go to the File menu and click on Open Worksheet.

2. Change the Files of type field to Excel, and navigate to the folder containing the Excel file.

3. Select the Excel file by clicking on the workbook.

### Note

Double-clicking will open the workbook, but it is important to use Preview and Options as in the following steps.

4. Click on Preview to see the file structure, make a note of the row in which column names appear, and the data appears. In the following example, the column names are in the third row and the data starts from the fourth row.

5. Click on OK and then select Options.

6. Select Variable Names to indicate the row of the column names. In this example we will use row 3. The first row of data can be set to row 4. The automatic setting will pick the next row with any data in it for the first row of data. Click on OK.

7. Click on Preview. We can check if this has helped with identifying the type of data. Further alterations to data type can be made. If further alterations need to be made, either change the data type from the drop-down list under each column name or return to Options to see what further changes need to be made.

8. Click on Open.

### How it works…

The Preview screen will display the first 100 rows in the dataset. This can be a useful tool in seeing how the file is going to be opened and then deciding what needs to be changed in options.

Excel files can come in many different formats, and while options cannot correct everything, it is an important first step.

If a dataset contains summarized data rows such as means or standard deviations at the end of the worksheet, it is best to exclude them. This can be performed by limiting the number of rows that Minitab will open.

Another option that is useful is to ignore blank data rows. Any row that is completely empty will be left out as it is unnecessary to include them in a Minitab worksheet.

### There's more…

Text files, CSV files, XML files, and more can be opened using the Open Worksheet option. While opening text files, column separators can be identified by using the field definition.

• The Opening data from Access using ODBC recipe

## Opening data from Access using ODBC

Here, we will show the instructions to pull data from a table within Access.

The instructions are left generic to enable us to use a suitable Access database. Try using these with your data.

### How to do it…

The following instructions detail the steps for importing data from a database into Minitab:

1. Within Minitab, go to the File menu and click on Query Database(ODBC).

2. Click on Machine Data Source, and select MS Access Database from the data source list. Then click on OK.

3. We can select the drive at the bottom of the next screen, and navigate to the folder containing the database.

4. Select the database and click on OK; fill in the username and password as required.

5. We will pick a table in the database from the drop-down control and then select the columns required from the selection on the left-hand side.

### How it works…

ODBC stands for Open Database Connectivity and is aimed to allow transfer of data between databases independent of operating systems or database systems. Most databases supporting ODBC can be queried this way using Minitab.

Here, Minitab is constructing an ODBC command to ask Access for the data. The command that was sent from Minitab to Access can be seen by going into the `History` folder.

## Stacking several columns together

Minitab will prefer data set up as columns, and often, it is better to stack data together, using one column for all the results, and a second or third column to group the information together. Here, we will stack several numeric columns together into one column.

We want to stack several columns together to give a column of results and an indicator column to identify the group they belong to.

We require a worksheet with several columns of the same data type. In this example, we use data from the `Party Membership of US Senators` file. This data can be found at

This data is in the Excel file format. Download the file `senators.xls`. If you are copying and pasting data into Minitab, copy lines 3 to line 53. Alternatively, when following the instructions in the Opening an Excel file in Minitab recipe, set row 3 for the variable names.

The first column should be named `State`. Ensure that in Minitab the data appears as shown in the following screenshot:

### How to do it…

The following instructions will stack the party columns together to create a single column for the number of US Senators:

1. Go to the Data menu and select Stack; then click on Columns.

2. Select the numeric columns, Democrats, Republicans, and Independents from the left-hand selection. They can be double-clicked on across into the section Stack the following columns:, alternatively select column 2 and drag down to column 4. Then click on the Select button to move the columns as a group.

3. Name the worksheet `Stacked Data`, from the New Worksheet section and click on OK.

4. The worksheet will appear as in the following screenshot. As the columns will not be named from the stack command we need to rename them in the new worksheet. Column 1 Subscripts should be renamed to `Party`, and column 2 should be renamed to `Senators`.

The preceding screenshot shows the stacked results that are created in a new worksheet. A useful tip here to avoid confusion while generating new worksheets, is to name the worksheet that will be generated in the stack column's dialog box.

### How it works…

By default, the columns are created in a new worksheet, and an additional subscript column is created to differentiate the groups in the data. This can be deselected and will identify the stacked data by numeric values: 1 for the first column, 2 for the second, and so on. Alternatively, the data can be stacked into the same worksheet.

• The Stacking blocks of columns at the same time recipe

• The Transposing the columns of a worksheet recipe

## Stacking blocks of columns at the same time

With some datasets, we may want to keep several columns together while running a stack command rather than running several individual stack commands to stack blocks of columns.

In the previous recipe, the data for senators was stacked, but we didn't keep the information about the states in the new worksheet. Stacking blocks of columns can keep the information about the states with the stacking command.

We will use the senators data from the previous recipe. This can be found from the following location:

The Getting ready section of the Stacking blocks of columns at the same time recipe has details on how to import the data.

### How to do it…

The following instructions will stack the party membership details and retain information about the state at the same time:

1. Navigate to Data | Stack and then click on Blocks of Columns.

2. In the first field, enter `Democrats` and select the `State` columns.

3. In the second field, enter `Republicans` and select the `State` columns

4. In the third field, enter `Independents` and select the `State` columns

5. In the Store stacked data in section, name the new worksheet `Stacked Data` and click on OK.

6. As with the stack command, the columns created in the new worksheet are not named. We will need to rename these columns, go to the section above the data in column one and name it `Party`, column two as `Senators`, and column three as `State`.

### How it works…

The stack blocks of columns commands will stack columns vertically by order in each row. The number of senators for Democrats will be placed in the second column at the top, those for Republicans will placed below it, and finally, the Independent senators will appear. The `State` column is placed in each row to ensure that the stacked data has the correct state next to each count of senators in the second column.

The optional selection of Use variable names in subscript column uses the name of the first column in each row to identify where the data has come from in the new worksheet and places this in the first column. Hence, in the stacked worksheet, column 1 lists Democrats, then Republicans, and finally, Independents.

The preceding screenshot shows the worksheet generated up to step 5 in the instructions. The column names are stored in the subscript column, `C1`. If we do not want the name of the parties to appear in column 1, then unselecting this option will create the subscript column with numbers instead.

As the stack commands do not name the columns, we should use step 6 to ensure that we can identify each column when we use these later in dialogs.

• The Stacking several columns together recipe

• The Transposing the columns of a worksheet recipe

## Transposing the columns of a worksheet

Data is often structured with categories denoted in a row and multiple observations of these dimensions in subsequent columns. This can be a popular format used in Excel worksheets. Here, we will transpose this into the column format that is preferred by Minitab in the following manner:

In the preceding screenshot, we can see that different individuals are listed starting from the second column. Their resting pulse is in first row, and after activity in second row, it is preferable for this data to be transposed.

The data used in this example is in the `Pulse workbook.xlx` file. To get this data into Minitab, follow the Opening an Excel file in Minitab recipe.

### How to do it…

The following instructions will transpose the numeric columns of the `Pulse workbook` file:

1. Click on the Data menu and then click on Transpose Columns.

2. Select the subject columns in the left-hand column panel by selecting C2 and dragging down through the columns till the end of the column list. Then click on Select.

### Tip

The left-hand pane in the following screenshot will show the available columns that can be selected. We could double-click on each column across, but here it is faster by dragging and selecting to highlight the columns we want. We could also use Shift or Ctrl to allow multiple column selection.

3. In the Create variable names using column section, enter `C1` and click on OK.

### How it works…

Columns of different data types cannot be transposed with this command. We cannot transpose column 1 from the original data into the numeric data held by the subject columns. Row one of the original worksheet becomes column one of the transposed worksheet. Mixing data types will generate an error. Instead, variable names should be optionally identified by the row categories if they are given in a column of the original worksheet. Here, we use column one to identify the column names of the transposed data.

• The Opening an Excel file in Minitab recipe

• The Stacking several columns together recipe

• The Stacking blocks of columns at the same time recipes

## Splitting a worksheet by categorical column

In this recipe, we will create several new worksheets from a grouping column. Split worksheet is a great way of quickly separating data out into separate worksheets.

We will use the file `pulse.mtw` from the sample Minitab data folder. Here, we will split the worksheet into two new worksheets: one for those who ran and the other for those who didn't run.

To open the pulse file, go to File and then click on Open Worksheet. Click on the button Look in Minitab Sample Data folder. Then find the file `pulse.mtw`. If the file does not show up, make sure the file type is set to Minitab (*.mtw; *.mpj).

### How to do it…

The following instructions will split a worksheet by a categorical column:

1. Go to the Data menu and click on Split Worksheet.

2. In the By variables section, select the `Ran` column, and click on OK.

### How it works…

Split worksheet is a very quick command to generate new worksheets for each grouping level of a column. If we had data on temperature measurements over time from different weather stations, the Split worksheet could generate a new worksheet for each weather station to enable us to analyze the data separately.

• The Creating a subset of data in a new worksheet recipe

## Creating a subset of data in a new worksheet

Here, we will want to obtain a smaller set of data from a large worksheet. This is useful when we do not want to analyze all the data, or maybe only a small portion is of interest to us. We will create a subset of this worksheet to look only at the most recent years, that is, from 2000 onwards.

The data shown in the following screenshot is from the Met office (http://www.metoffice.gov.uk/climate/uk/stationdata/) and shows weather details for the Oxford weather station on a monthly basis from 1853. We will subset the data to view the results from 2000 onwards. Select the Oxford station data and copy the data into Minitab.

### Note

When copying the data into Minitab from the website, only select the information for the year and weather. Copying the column header information will create text columns. Minitab only allows one header for column names, and the dataset here has a header for names and a second header for units. It is advised that we only grab the information for now and rename the columns afterwards.

The `Oxford weather (cleaned).mtw` file is provided in the code bundle.

### How to do it...

The following instructions will generate a new worksheet of weather data 2000 onwards:

1. Go to the Data menu and click on Subset Worksheet.

2. Rename the new worksheet as `Temperature for 2000 onwards` and click on the Condition button.

3. In the Condition section, double-click on the Year column to move this into the Condition section, and then add `>= 2000`, as shown in the following screenshot. Then click on OK twice.

### How it works…

Subset worksheet can create a new worksheet based on an entered condition, row numbers, or selected data that we have brushed on a chart.

This gives a bit more control than split worksheet, and we only generate one worksheet.

### There's more…

With this command, data can be excluded or included very quickly.

Row numbers can be specified in the Row numbers field, either singly, with a space between each value (for example, `2 4 8 9 10`), or a range of row numbers can be entered by using a colon (for example, `100:150`).

The Condition section works in the same way as the calculator. Multiple conditions can be added with the use of And, Or, and Not. Text values should be referred to within double quotes (for example, `"April"`) and date formatted data should be used as `DATE("05/11/2000")`.

• The Splitting a worksheet by categorical column recipe

## Extracting values from a date/time column

Often, it can be useful to find values such as day, month, or hour of the day from a date column. This can help later on while trying to find out about the variation in our results, finding effects based on day of the week or month of the year.

The data column displayed in the following screenshot shows dates in the mm/dd/yyyy format. To run this recipe, type the following data into a new column and label the column as `Date`. We will then create a new column displaying the months from the dates.

### How to do it…

The following steps will create a column for the month from the dates in the worksheet:

1. Navigate to Data | Extract from Date/Time | Extract to Text.

2. In the dialog box, enter the date column in the Extract from date/time column section. Enter `Month` into the Store text column in section.

3. Tick the Month selection and click on OK.

### How it works…

If the worksheet does not contain a column called `Month` already, it will create a new column for us. The selected fields will then populate this column in the worksheet. The Extract to Text option will create months and days of the week by their names, where the Extract from date/time to Numeric option will create a column of numbers.

## Calculator – basic functions

The e-calculator tool is analogous to formulas in Excel. In Minitab, we will run calculations on the columns one at a time. Using the weather data from the Oxford station, we will create a new column for the hours of sunlight in a month divided bythe amount of rainfall (in millimeters) for that month.

We will be using data from Oxford weather station; it can be obtained from http://www.metoffice.gov.uk/climate/uk/stationdata/. Select Oxford weather station data and copy the data into Minitab. Only copy the data and not the column headers. Then, rename the columns as `Year`, `Month`, `T Max`, `T Min`, `Airfrost(days)`, `Rain(mm)`, and `Sun(Hrs)`.

### Note

Inspect the data that we copy to ensure that the columns have the right data type. The column header `C1-T` will indicate text, `C1-D` will indicate data, and `C1` without the hyphen is a numeric column.

Also, note that a few of the values in the data will be recorded as missing; this is where text values are supplied next to the number. A cleaned worksheet is provided in the `Oxford weather (cleaned).mtw` file.

### How to do it…

The following instructions will create a new column for the hours of sunlight to amount of rainfall in a month ratio:

1. Go to the Calc menu and click on Calculator.

2. In the dialog box, enter a name for the new column into the Store result in Variable section. We will name it as `'Sun(Hrs)/Rain(mm)'`.

3. Select the section labeled Expression, and then double-click on `Sun (Hours)` to move it across, enter `/`, and then move the `Rain(mm)` column over.

4. Tick the box labeled Assign as formula and click on OK.

### How it works…

The results of this function are generated in the selected column. The checkbox Assign as a formula will create a formula in the worksheet that will keep updating it. By default, this is not selected to avoid problems with circular or self-referencing formulas. When a formula is stored in a column, a green cross is placed at the top of the column to indicate the presence of a formula. The green cross indicates that this is up-to-date, while a red cross would indicate a problem in updating the formula.

### There's more…

The calculator in Minitab has a plethora of functions available. We can choose from the function list on the right in the calculator screen. These include several statistical, mathematical, and text functions. They can be filtered with the drop-down list at the top. The help function for the calculator includes a list of functions and how to use them.

• The Calculator – using an if statement recipe

• The Cleaning up a text column with the calculator recipe

## Calculator – using an if statement

If statements can be useful tools in extracting information from a column or in reformatting data. Here, we will recode the values in the pulse worksheet.

In the pulse dataset, the `C4 smokes` column indicates if an individual is a smoker. 1 indicates a regular smoker and 2 stands for those who do not smoke. We will use an if statement to replace the numeric values for smokers and nonsmokers.

The data used here is one of the example files that come with Minitab. Open the `pulse.mtw` worksheet from the sample Minitab data folder by going to the File menu and clicking on Open Worksheet. Select the icon Look in Minitab Sample Data folder.

### How to do it…

The following instructions will convert the values of `1` and `2` in the `Smokes` column to `Smokes` and `Non-Smoker`

1. Go to the Calc menu and select Calculator.

2. In Store result in variable, enter the column `Smokes`.

3. In the Expression text box, enter `IF('Smokes' = 1, "Smoker", "Non-Smoker")`.

4. Click on OK.

### How it works…

An if statement is entered in the following format: `IF(Logical statement, TRUE, FALSE)`; the value entered after the first comma being the true for the statement. Numeric values for the response are entered purely as numbers without any double quotes. Text is always entered inside double quotes, for example, `"True"`. Column names are used with single quotes, for example, `'Pulse1'`.

### Note

Column names can be typed, double-clicked, or selected in the column from the available list on the left of the dialog box. While single quotes are used to denote columns, they are not always necessary. A column name, that is, a single name without special characters or spaces in the name does not need to be referred to with single quotes. The column `Ran` can be used as `Ran` or `'Ran'`. Selecting a column from the list will automatically enter single quotes where necessary.

### There's more…

The `IF` statements can be generalized to multiple `IF` conditions very simply. A general `IF` statement would have the following syntax:

````IF(Statement1, True1, Statement2, True2… StatementN, TrueN, False)`
```

When identifying missing values in the worksheet, the code must be used as the first statement. These values can be identified with the code `Miss()` or `'*'`.

For example, the following will identify a missing result in the `Smokes` column as `Unknown`, `1` as `Smoker`, and all others as `Non-Smoker`:

````IF('Smokes' = '*', "Unknown", 'Smokes' = 1, "Smoker", "Non-Smoker")`
```

• The Calculator – basic functions recipe

• The Coding a numeric column to text values recipe

## Coding a numeric column to text values

In the pulse worksheet, the `Activity` column lists the usual level of activity of the students in the study, where `1` denotes slight activity, `2` is moderate activity, and `3` stands for a lot of activity.

We will change the numeric values of `Activity` from `1`, `2`, and `3` into `Slight`, `Moderate`, and `High`.

Use File and Open Worksheet to open the `pulse.mtw` dataset worksheet from the sample Minitab data folder.

### How to do it…

The following instructions will convert the values of `1`, `2`, and `3` in the `Activity` column to `Slight`, `Moderate`, and `High`:

1. In the Data menu, navigate to Code | Numeric to Text.

2. Enter the `Activity` column into Code data from columns.

3. Enter the `Activity` column into the section Store coded data in columns.

4. In the Original values section, enter `1` in the first line, `2` in the second line, and `3` in the third line, as shown in the following screenshot:

5. In the New section, enter `Slight` in the first line, `Moderate` in the second line, and `High` in the third line.

6. Click on OK.

### How it works…

The code tools work like an `IF` statement. The code tools cannot be set as an updating formula in the worksheet like the `IF` statements can be set from the calculator.

In this example, we are replacing the numeric values in the `Activity` column and putting the new results back into the same column. This will change the data type of the `Activity` column from `Numeric` to `Text`.

### There's more…

Ranges of numbers can be entered using colons. If we wish to change the values from `0` to `30` as low, `30` to `60` as medium, and `60` and above as high, then we would enter these as shown in the following screenshot:

The code tool does not accept symbols such as `>`, `<`, or `=`, and therefore, we can use a large number, such as 100000, to create a greater than statement.

• The Calculator – using an if statement recipe

## Cleaning up a text column with the calculator

In the dataset shown in the following screenshot, volunteers are listed with their first and last names:

Throughout the column, there are errors in the capitalization of names. We will apply the correct case to the names and separate the names into new columns for first and last name.

Enter the names in the screenshot into a new column in Minitab. Name the column `Volunteer`.

### How to do it…

The following instructions will use the proper word commands in the calculator to create a column of correctly capitalized surnames:

1. In the Calc menu, click on Calculator.

2. In the section Store result in variable, create a new column called `Surname`.

3. Filter the function list by selecting the text functions from the drop-down list. Then look for the function `Proper`. Double-click on the function to move it into the calculator.

4. Next, find the function `Word` in the text function list. Double-click on this to highlight and replace the section `text` in the calculator. This should look like the following screenshot:

5. Double-click on the column `Volunteer` to bring it into the expression and replace the section labeled text.

6. Replace `word_num` with `2` and delete the delimiters and brackets. The expression should look like the following screenshot:

7. Click on OK to create a column of surnames capitalized correctly.

### How it works…

The text function `Proper` returns a capital letter at the start of each word. The rest are set to lowercase. The `Word` function is used to locate a word in a text string. The `Word 2` function finds the second word in a cell, using spaces as delimiters by default.

Nesting the `Word` function inside the `Proper` command tells Minitab to find the second word for the text in the column `Volunteer` and then set the first letter in each word as a capital and all others as lowercase.

### There's more…

There is a list of calculator functions within the Help menu for the calculator. The text functions include a number of useful tools. Note, `Item` is similar to the `Word` function but allows the separator to be specified. For example, `A_123_tx` could be used with an `Item` function such as `Item(c1,2,"_")` to define the separator of the items as an underscore.

It is also worth noting that Find and Replace work in the worksheet as they do in other applications. Press Ctrl + F, and Ctrl + H or they can be found in the Editor menu.