In this chapter, we'll be covering the following recipes:
Opening an Excel file in Minitab
Opening data from Access using ODBC
Stacking several columns together
Stacking blocks of columns at the same time
Transposing the columns of a worksheet
Splitting a worksheet by categorical column
Creating a subset of data in a new worksheet
Extracting values from a date/time column
Calculator – basic functions
Calculator – using an if statement
Coding a numeric column to text values
Cleaning up a text column with the calculator
This first chapter illustrates the use of a worksheet, data menu, and calculator tools. We start by bringing data in from other applications and then working on it using other data tools within the calculator menu. The emphasis is on getting data into Minitab and reformatting into a preferred structure for later studies.
As you can see in the preceding screenshot, Minitab prefers working in columns. This can be structurally a different approach to Excel, where the data is held in cells. We will start by bringing our data in from Excel and Access, and then move into reformatting the data. It should be noted that copying and pasting into Minitab can be a perfectly acceptable method of moving data from one application to another.
Preparation for this task is very simple. We only need a set of data saved in an Excel workbook. Any worksheet is suitable, but be aware that the Open Worksheet… command will open every worksheet in an Excel workbook at the same time. The formatting options that we will use here are applied across the entire workbook and cannot have separate format options for every worksheet. Minitab worksheets have a maximum limit of 4000 columns and a practical limit of 10 million rows.
We follow an example here using the
Pulse workbook.xlx worksheet.
The following instructions detail the steps for importing data from Excel by using the open worksheet command:
Change the Files of type field to Excel, and navigate to the folder containing the Excel file.
Select the Excel file by clicking on the workbook.
Click on Preview to see the file structure, make a note of the row in which column names appear, and the data appears. In the following example, the column names are in the third row and the data starts from the fourth row.
Click on OK and then select Options.
Select Variable Names to indicate the row of the column names. In this example we will use row 3. The first row of data can be set to row 4. The automatic setting will pick the next row with any data in it for the first row of data. Click on OK.
Click on Preview. We can check if this has helped with identifying the type of data. Further alterations to data type can be made. If further alterations need to be made, either change the data type from the drop-down list under each column name or return to Options to see what further changes need to be made.
Click on Open.
The Preview screen will display the first 100 rows in the dataset. This can be a useful tool in seeing how the file is going to be opened and then deciding what needs to be changed in options.
Excel files can come in many different formats, and while options cannot correct everything, it is an important first step.
If a dataset contains summarized data rows such as means or standard deviations at the end of the worksheet, it is best to exclude them. This can be performed by limiting the number of rows that Minitab will open.
Text files, CSV files, XML files, and more can be opened using the Open Worksheet option. While opening text files, column separators can be identified by using the field definition.
Here, we will show the instructions to pull data from a table within Access.
The following instructions detail the steps for importing data from a database into Minitab:
Within Minitab, go to the File menu and click on Query Database(ODBC).
Click on Machine Data Source, and select MS Access Database from the data source list. Then click on OK.
We can select the drive at the bottom of the next screen, and navigate to the folder containing the database.
Select the database and click on OK; fill in the username and password as required.
We will pick a table in the database from the drop-down control and then select the columns required from the selection on the left-hand side.
ODBC stands for Open Database Connectivity and is aimed to allow transfer of data between databases independent of operating systems or database systems. Most databases supporting ODBC can be queried this way using Minitab.
Here, Minitab is constructing an ODBC command to ask Access for the data. The command that was sent from Minitab to Access can be seen by going into the
Minitab will prefer data set up as columns, and often, it is better to stack data together, using one column for all the results, and a second or third column to group the information together. Here, we will stack several numeric columns together into one column.
We want to stack several columns together to give a column of results and an indicator column to identify the group they belong to.
We require a worksheet with several columns of the same data type. In this example, we use data from the
Party Membership of US Senators file. This data can be found at
This data is in the Excel file format. Download the file
senators.xls. If you are copying and pasting data into Minitab, copy lines 3 to line 53. Alternatively, when following the instructions in the Opening an Excel file in Minitab recipe, set row 3 for the variable names.
The first column should be named
State. Ensure that in Minitab the data appears as shown in the following screenshot:
Go to the Data menu and select Stack; then click on Columns.
Select the numeric columns, Democrats, Republicans, and Independents from the left-hand selection. They can be double-clicked on across into the section Stack the following columns:, alternatively select column 2 and drag down to column 4. Then click on the Select button to move the columns as a group.
Name the worksheet
Stacked Data, from the New Worksheet section and click on OK.
The worksheet will appear as in the following screenshot. As the columns will not be named from the stack command we need to rename them in the new worksheet. Column 1 Subscripts should be renamed to
Party, and column 2 should be renamed to
The preceding screenshot shows the stacked results that are created in a new worksheet. A useful tip here to avoid confusion while generating new worksheets, is to name the worksheet that will be generated in the stack column's dialog box.
By default, the columns are created in a new worksheet, and an additional subscript column is created to differentiate the groups in the data. This can be deselected and will identify the stacked data by numeric values: 1 for the first column, 2 for the second, and so on. Alternatively, the data can be stacked into the same worksheet.
In the previous recipe, the data for senators was stacked, but we didn't keep the information about the states in the new worksheet. Stacking blocks of columns can keep the information about the states with the stacking command.
We will use the senators data from the previous recipe. This can be found from the following location:
The Getting ready section of the Stacking blocks of columns at the same time recipe has details on how to import the data.
The following instructions will stack the party membership details and retain information about the state at the same time:
Navigate to Data | Stack and then click on Blocks of Columns.
In the first field, enter
Democratsand select the
In the second field, enter
Republicansand select the
In the third field, enter
Independentsand select the
As with the stack command, the columns created in the new worksheet are not named. We will need to rename these columns, go to the section above the data in column one and name it
Party, column two as
Senators, and column three as
The stack blocks of columns commands will stack columns vertically by order in each row. The number of senators for Democrats will be placed in the second column at the top, those for Republicans will placed below it, and finally, the Independent senators will appear. The
State column is placed in each row to ensure that the stacked data has the correct state next to each count of senators in the second column.
The optional selection of Use variable names in subscript column uses the name of the first column in each row to identify where the data has come from in the new worksheet and places this in the first column. Hence, in the stacked worksheet, column 1 lists Democrats, then Republicans, and finally, Independents.
The preceding screenshot shows the worksheet generated up to step 5 in the instructions. The column names are stored in the subscript column,
C1. If we do not want the name of the parties to appear in column 1, then unselecting this option will create the subscript column with numbers instead.
Data is often structured with categories denoted in a row and multiple observations of these dimensions in subsequent columns. This can be a popular format used in Excel worksheets. Here, we will transpose this into the column format that is preferred by Minitab in the following manner:
In the preceding screenshot, we can see that different individuals are listed starting from the second column. Their resting pulse is in first row, and after activity in second row, it is preferable for this data to be transposed.
The data used in this example is in the
Pulse workbook.xlx file. To get this data into Minitab, follow the Opening an Excel file in Minitab recipe.
The following instructions will transpose the numeric columns of the
Pulse workbook file:
Click on the Data menu and then click on Transpose Columns.
Select the subject columns in the left-hand column panel by selecting C2 and dragging down through the columns till the end of the column list. Then click on Select.
In the Create variable names using column section, enter
C1and click on OK.
Columns of different data types cannot be transposed with this command. We cannot transpose column 1 from the original data into the numeric data held by the subject columns. Row one of the original worksheet becomes column one of the transposed worksheet. Mixing data types will generate an error. Instead, variable names should be optionally identified by the row categories if they are given in a column of the original worksheet. Here, we use column one to identify the column names of the transposed data.
We will use the file
pulse.mtw from the sample Minitab data folder. Here, we will split the worksheet into two new worksheets: one for those who ran and the other for those who didn't run.
To open the pulse file, go to File and then click on Open Worksheet. Click on the button Look in Minitab Sample Data folder. Then find the file
pulse.mtw. If the file does not show up, make sure the file type is set to Minitab (*.mtw; *.mpj).
Go to the Data menu and click on Split Worksheet.
In the By variables section, select the
Rancolumn, and click on OK.
Split worksheet is a very quick command to generate new worksheets for each grouping level of a column. If we had data on temperature measurements over time from different weather stations, the Split worksheet could generate a new worksheet for each weather station to enable us to analyze the data separately.
Here, we will want to obtain a smaller set of data from a large worksheet. This is useful when we do not want to analyze all the data, or maybe only a small portion is of interest to us. We will create a subset of this worksheet to look only at the most recent years, that is, from 2000 onwards.
The data shown in the following screenshot is from the Met office (http://www.metoffice.gov.uk/climate/uk/stationdata/) and shows weather details for the Oxford weather station on a monthly basis from 1853. We will subset the data to view the results from 2000 onwards. Select the Oxford station data and copy the data into Minitab.
When copying the data into Minitab from the website, only select the information for the year and weather. Copying the column header information will create text columns. Minitab only allows one header for column names, and the dataset here has a header for names and a second header for units. It is advised that we only grab the information for now and rename the columns afterwards.
Oxford weather (cleaned).mtw file is provided in the code bundle.
Go to the Data menu and click on Subset Worksheet.
Rename the new worksheet as
Temperature for 2000 onwardsand click on the Condition button.
Subset worksheet can create a new worksheet based on an entered condition, row numbers, or selected data that we have brushed on a chart.
This gives a bit more control than split worksheet, and we only generate one worksheet.
With this command, data can be excluded or included very quickly.
Row numbers can be specified in the Row numbers field, either singly, with a space between each value (for example,
2 4 8 9 10), or a range of row numbers can be entered by using a colon (for example,
The Condition section works in the same way as the calculator. Multiple conditions can be added with the use of And, Or, and Not. Text values should be referred to within double quotes (for example,
"April") and date formatted data should be used as
Often, it can be useful to find values such as day, month, or hour of the day from a date column. This can help later on while trying to find out about the variation in our results, finding effects based on day of the week or month of the year.
The data column displayed in the following screenshot shows dates in the mm/dd/yyyy format. To run this recipe, type the following data into a new column and label the column as
Date. We will then create a new column displaying the months from the dates.
Navigate to Data | Extract from Date/Time | Extract to Text.
In the dialog box, enter the date column in the Extract from date/time column section. Enter
Monthinto the Store text column in section.
Tick the Month selection and click on OK.
If the worksheet does not contain a column called
Month already, it will create a new column for us. The selected fields will then populate this column in the worksheet. The Extract to Text option will create months and days of the week by their names, where the Extract from date/time to Numeric option will create a column of numbers.
The e-calculator tool is analogous to formulas in Excel. In Minitab, we will run calculations on the columns one at a time. Using the weather data from the Oxford station, we will create a new column for the hours of sunlight in a month divided bythe amount of rainfall (in millimeters) for that month.
We will be using data from Oxford weather station; it can be obtained from http://www.metoffice.gov.uk/climate/uk/stationdata/. Select Oxford weather station data and copy the data into Minitab. Only copy the data and not the column headers. Then, rename the columns as
Inspect the data that we copy to ensure that the columns have the right data type. The column header
C1-T will indicate text,
C1-D will indicate data, and
C1 without the hyphen is a numeric column.
Also, note that a few of the values in the data will be recorded as missing; this is where text values are supplied next to the number. A cleaned worksheet is provided in the
Oxford weather (cleaned).mtw file.
Go to the Calc menu and click on Calculator.
In the dialog box, enter a name for the new column into the Store result in Variable section. We will name it as
Select the section labeled Expression, and then double-click on
Sun (Hours)to move it across, enter
/, and then move the
Tick the box labeled Assign as formula and click on OK.
The results of this function are generated in the selected column. The checkbox Assign as a formula will create a formula in the worksheet that will keep updating it. By default, this is not selected to avoid problems with circular or self-referencing formulas. When a formula is stored in a column, a green cross is placed at the top of the column to indicate the presence of a formula. The green cross indicates that this is up-to-date, while a red cross would indicate a problem in updating the formula.
The calculator in Minitab has a plethora of functions available. We can choose from the function list on the right in the calculator screen. These include several statistical, mathematical, and text functions. They can be filtered with the drop-down list at the top. The help function for the calculator includes a list of functions and how to use them.
In the pulse dataset, the
C4 smokes column indicates if an individual is a smoker. 1 indicates a regular smoker and 2 stands for those who do not smoke. We will use an if statement to replace the numeric values for smokers and nonsmokers.
The data used here is one of the example files that come with Minitab. Open the
pulse.mtw worksheet from the sample Minitab data folder by going to the File menu and clicking on Open Worksheet. Select the icon Look in Minitab Sample Data folder.
Go to the Calc menu and select Calculator.
In Store result in variable, enter the column
In the Expression text box, enter
IF('Smokes' = 1, "Smoker", "Non-Smoker").
Click on OK.
An if statement is entered in the following format:
IF(Logical statement, TRUE, FALSE); the value entered after the first comma being the true for the statement. Numeric values for the response are entered purely as numbers without any double quotes. Text is always entered inside double quotes, for example,
"True". Column names are used with single quotes, for example,
Column names can be typed, double-clicked, or selected in the column from the available list on the left of the dialog box. While single quotes are used to denote columns, they are not always necessary. A column name, that is, a single name without special characters or spaces in the name does not need to be referred to with single quotes. The column
Ran can be used as
'Ran'. Selecting a column from the list will automatically enter single quotes where necessary.
IF statements can be generalized to multiple
IF conditions very simply. A general
IF statement would have the following syntax:
IF(Statement1, True1, Statement2, True2… StatementN, TrueN, False)
For example, the following will identify a missing result in the
Smokes column as
Smoker, and all others as
IF('Smokes' = '*', "Unknown", 'Smokes' = 1, "Smoker", "Non-Smoker")
In the pulse worksheet, the
Activity column lists the usual level of activity of the students in the study, where
1 denotes slight activity,
2 is moderate activity, and
3 stands for a lot of activity.
Use File and Open Worksheet to open the
pulse.mtw dataset worksheet from the sample Minitab data folder.
The following instructions will convert the values of
3 in the
Activity column to
In the Data menu, navigate to Code | Numeric to Text.
Activitycolumn into Code data from columns.
Activitycolumn into the section Store coded data in columns.
In the Original values section, enter
1in the first line,
2in the second line, and
3in the third line, as shown in the following screenshot:
In the New section, enter
Slightin the first line,
Moderatein the second line, and
Highin the third line.
Click on OK.
In this example, we are replacing the numeric values in the
Activity column and putting the new results back into the same column. This will change the data type of the
Activity column from
Ranges of numbers can be entered using colons. If we wish to change the values from
30 as low,
60 as medium, and
60 and above as high, then we would enter these as shown in the following screenshot:
The code tool does not accept symbols such as
=, and therefore, we can use a large number, such as 100000, to create a greater than statement.
Throughout the column, there are errors in the capitalization of names. We will apply the correct case to the names and separate the names into new columns for first and last name.
Enter the names in the screenshot into a new column in Minitab. Name the column
In the Calc menu, click on Calculator.
In the section Store result in variable, create a new column called
Filter the function list by selecting the text functions from the drop-down list. Then look for the function
Proper. Double-click on the function to move it into the calculator.
Next, find the function
Wordin the text function list. Double-click on this to highlight and replace the section
textin the calculator. This should look like the following screenshot:
Double-click on the column
Volunteerto bring it into the expression and replace the section labeled text.
2and delete the delimiters and brackets. The expression should look like the following screenshot:
Click on OK to create a column of surnames capitalized correctly.
The text function
Proper returns a capital letter at the start of each word. The rest are set to lowercase. The
Word function is used to locate a word in a text string. The
Word 2 function finds the second word in a cell, using spaces as delimiters by default.
Word function inside the
Proper command tells Minitab to find the second word for the text in the column
Volunteer and then set the first letter in each word as a capital and all others as lowercase.
There is a list of calculator functions within the Help menu for the calculator. The text functions include a number of useful tools. Note,
Item is similar to the
Word function but allows the separator to be specified. For example,
A_123_tx could be used with an
Item function such as
Item(c1,2,"_") to define the separator of the items as an underscore.
It is also worth noting that Find and Replace work in the worksheet as they do in other applications. Press Ctrl + F, and Ctrl + H or they can be found in the Editor menu.