Introduction to SPSS
Overview of SPSS for Windows
Section 2: Entering Data in SPSS
The Data Editor
The Syntax Editor
The Output Viewer
Importing Data from Excel Files
Importing Data from ASCII Files
Section 3: Modifying and Organizing Data in SPSS
Creating and Defining Data
Inserting Cases and Variables
Computing New Variables
This document is the first of a series of four modules intended for beginning SPSS users, providing an overview of SPSS for Windows. This first module introduces readers to the SPSS for Windows environment, and discusses how to create or import a dataset, transform variables, manipulate data, and perform descriptive statistics. The second module describes some commonly used inferential statistics, the third module discusses graphical display of output, and the fourth module covers other advanced topics. Throughout these modules, a single dataset, Employee data.sav, is used for all examples. This example dataset that is provided with recent versions of SPSS. Thus, you will have access to the dataset and will be able to use SPSS to test your knowledge by replicating the examples contained in this document. Although the present documentation assumes SPSS Version 10.0, it will still be useful to users of SPSS on the Macintosh platform as well as many earlier, similar versions of SPSS. If you are a University of Texas affiliate and do not have access to SPSS or would like the software for your personal computer, visit the Software Distribution Services Web page at http://www.utexas.edu/cc/sds/ to get more information about obtaining the latest version of SPSS.
This document also contains some information about using keystrokes (or "accelerator keys") in SPSS 10. For more information, see our SPSS 10 for Windows Keystoke Manual.
Section 1: Overview
Introduction to SPSS
SPSS is a software package used for conducting statistical analyses, manipulating data, and generating tables and graphs that summarize data. Statistical analyses range from basic descriptive statistics, such as averages and frequencies, to advanced inferential statistics, such as regression models, analysis of variance, and factor analysis. SPSS also contains several tools for manipulating data, including functions for recoding data and computing new variables as well as merging and aggregating datasets. SPSS also has a number of ways to summarize and display data in the form of tables and graphs.
Overview of SPSS for Windows
SPSS for Windows consists of five different windows, each of which is associated with a particular SPSS file type. This document discusses the two windows most frequently used in analyzing data in SPSS, the Data Editor and the Output Viewer windows. In addition, the Syntax Editor and the use of SPSS command syntax is discussed briefly. The Data Editor is the window that is open at start-up and is used to enter and store data in a spreadsheet format. The Output Viewer opens automatically when you execute an analysis or create a graph using a dialog box or command syntax to execute a procedure. The Output Viewer contains the results of all statistical analyses and graphical displays of data. The Syntax Editor is a text editor where you compose SPSS commands and submit them to the SPSS processor. All output from these commands will appear in the Output Viewer. This document focuses on the methods necessary for inputting, defining, and organizing data in SPSS.
Section 2: Entering Data in SPSS
To start SPSS, go to the Start icon under Windows 95, Windows 98, Windows 2000, and Windows NT. You should find an SPSS icon under the Programs menu item. You can also start SPSS by double-clicking on an SPSS file.
The Data Editor
The Data Editor window displays the contents of the working dataset. It is arranged in a spreadsheet format that contains variables in columns and cases in rows. There are two sheets in the window. The Data View is the sheet that is visible when you first open the Data Editor and contains the data. You can access the second sheet by clicking on the tab labeled Variable View and while the second sheet is similar in appearance to the first, it is does not actually contain data. Instead, this second sheet contains information about the dataset that is stored with the dataset. Unlike most spreadsheets, the Data Editor can only have one dataset open at a time. However, beginning with version 10.0, you can open multiple Data Editors at one time, each of which contains a separate dataset. Datasets that are currently open are called working datasets and all data manipulations, statistical functions, and other SPSS procedures operate on these datasets. The Data Editor contains several menu items that are useful for performing various operations on your data. Here is the Data Editor, containing the Employee data.sav dataset:
Data can be directly entered in SPSS, or a file containing data can be opened in the Data Editor. From the menu in the Data Editor window, choose the following menu options (or type Alt+F+O+A):
If the file you want to open is not an SPSS data file, you can often use the Open menu item to import that file directly into the Data Editor. If a data file is not in a format that SPSS recognizes, then try using the software package in which the file was originally created to translate it into a format that can be imported into SPSS (e.g., tab-delimited data).
The Syntax Editor
Another important window in the SPSS environment is the Syntax Editor. In earlier versions of SPSS, all of the procedures performed by SPSS were submitted through the use of syntax which instructed SPSS on how to process your data. More recent versions contain pull-down menus with dialog boxes that allow you to submit commands to SPSS without ever writing syntax. These SPSS for Windows tutorials focus on the use of the dialog boxes to execute procedures; however, there are a couple of important reasons why you should be aware of SPSS syntax even if you plan to primarily use the dialog boxes. First, not all procedures are available through the dialog boxes. Therefore, you may occasionally have to submit commands from the Syntax Editor. Second, you should be aware of the Syntax Editor so that you can save procedures as syntax to be rerun at a latter date. The dialog boxes available through the pull-down menus have a button labeled Paste which will print the syntax for the procedure you are running in the dialog box environment to the Syntax Editor. Thus, you can easily generate SPSS syntax without typing in the Syntax Editor. This process is illustrated below.
The following dialog box is used to generate descriptive statistics. Here, only the Paste button in the dialog box is relevant. The process used for generating descriptive statistics is described later.
By clicking on the Paste button, the procedure that the above dialog box is prepared to run will be written in the form of SPSS syntax to the Syntax Editor. Thus, clicking the Paste (or Alt + P) button in the above example would produce the following syntax:
/STATISTICS=MEAN STDDEV MIN MAX .
This syntax will produce exactly the same output as would be generated by clicking the OK button in the above dialog box. The syntax that is printed to the Syntax Editor can then be saved and run at a later time as long as the same dataset, or at least a dataset continuing the variables with the same names, is active in the Data Editor window. Saving syntax is useful if you think you may want to rerun your analysis after you add more data or if you want to run the same analysis on another dataset that contains the same variables.
The Output Viewer
All output from statistical analyses is printed to the Output Viewer window as well as other useful information. When you execute a command for a statistical analysis, regardless of whether you used syntax or dialog boxes, the output will be printed in the Output Viewer. Some other output that you may want to have printed to the Output Viewer are command syntax, titles, and error messages. The Output Viewer is shown below:
The left frame of the Output Viewer contains an outline of the objects contained in the window. For example, the icon labeled Log represents the command syntax shown at the top of the figure. Everything under Descriptives in the outline refers to objects associated with the descriptive statistics. The Title object refers to the bold title Descriptives in the output while the highlighted icon labeled Descriptive Statistics refers to the table containing descriptive statistics. The Notes icon has no referent in the above example, but it would refer to any notes that appeared between the title and the table. This outline can be useful for navigating in your Output Viewer when you have large amounts of output. By clicking on an icon, you can move to the location of the output represented by that icon in the Output Viewer. You can also copy, paste, or delete objects by first highlighting them in the outline and then performing the operation you want.
You can control what is displayed in your output by using the Options menu item (or Alt+E+N) on the Edit menu:
Selecting this option will produce the following dialog box:
This figure shows the Options dialog box with the Draft Viewer tab selected, to choose which options you want to appear in the Output Viewer. Most commands are selected by default. Here, the Display commands in log option, normally unselected, was selected so that the command syntax will be written to the log in the Output Viewer. This can be useful for keeping track of which procedures you have executed.
Importing Data From Excel Files
Data can be imported into SPSS from Microsoft Excel and several other applications with relative ease. This document describes a method for importing an Excel spreadsheet into SPSS. If you are working with a spreadsheet in another software package, you may want to save your data as an Excel file, then import it into SPSS. If you have a spreadsheet that is arranged in a database format (e.g., you have several tables in your Workbook that are related through identification fields), there is another method for importing Excel file that you might consider that will merge tables within your database as part of the import procedure. It is described in the fourth module of this tutorial series, Data Manipulation and Advanced Topics, in the Database Capture section.
To open an Excel file, select the following menu options from the menu in the Data Editor window in SPSS (or Alt+F+O+A):
First, select the desired location on disk using the Look in option. Next, select Excel from the Files of type drop-down menu. The file you saved should now appear in the main box in the Open File dialog box. You can open it by double-clicking on it. You will be presented with one more dialog box:
This dialog box allows you to select a spreadsheet from within the Excel Workbook. The drop-down menu in the example shown above offers two sheets from which to choose. As SPSS only operates on one spreadsheet at a time, you can only select one sheet from this menu. This box also gives you the option of reading variable names from the Excel Workbook directly into SPSS. Click on the Read variable names box to read in the first row of your spreadsheet as the variable names. If the first row of your spreadsheet does indeed contain the names of your variables and you want to import them into SPSS, these variables names should conform to SPSS variable naming conventions (eight characters or fewer, not beginning with any special characters). If you do not have variable names, use the procedure described below in Creating and Defining Data to add variable names to your dataset after you have imported your data into SPSS. You should now see data in the Data Editor window. Check to make sure that all variables and cases were read correctly. Next, save your dataset in SPSS format by choosing the Save option in the File menu.
If you are using a version of SPSS that was released prior to SPSS 10.0, there are a few additional steps that are necessary for opening an Excel spreadsheet directly into SPSS. You will need to save the file as an Excel version 4.0 or lower Excel Worksheet which is a file containing a single spreadsheet. More recent versions of Excel use the Excel Workbook format, which contains several spreadsheets. Because SPSS only allows one dataset to be active at any given time, it can read Excel spreadsheets which are a single spreadsheet but not Excel Workbooks which are several spreadsheets. To save an Excel file as a Excel Worksheet, choose the following from the menus (or Alt+F+A):
After you assign the file a new name in the File name box and choose a location on disk with the Save in box, make sure you select the Microsoft Excel 4.0 Worksheet (*.xls) option from the Save as type pull-down menu. You will receive the following warning: "The selected file type does not support workbooks that contain multiple sheets." This warning is letting you know that only the visible worksheet in the Excel Workbook will be saved --not all of the sheets in the workbook. Click the OK button here if the spreadsheet you want to import into SPSS is currently the visible sheet in your Workbook. After saving the file in this format, be sure to close the file in Excel because SPSS cannot open a file that is currently open in Excel. At this point, the file is ready to be opened in SPSS and can be opened using the procedures for SPSS 10.0 described above with the exception that you will not be offered the option to choose from available sheets as there is only a single sheet in the Worksheet.
Importing data from ASCII files
Data are often stored in an ASCII file format, alternatively known as a text or flat file format. Typically, columns of data in an ASCII file are separated by a space, tab, comma, or some other character. SPSS 9.0 has a Text Import Wizard that will help you import data in an ASCII file format. The Text Import Wizard will open automatically when a ASCII file (a file with a .txt or .dat extension) is opened using the Open option in the File menu. If the data file you want to open does not have a .txt or .dat extension but you know that it is an ASCII file, then you can open the data file by opening the Data Import Wizard from the File menu (or Alt+F+R):
Read Text Data
The Text Import Wizard will first prompt you to select a file to import. After you have selected a file, you will go through a series of dialog boxes that will provide you with several options for importing data. Once you have imported your data and checked it for accuracy, be sure to save a copy of the dataset in SPSS format by selecting the Save or Save As options from the File menu:
(or Alt+F+S or Alt+F+A)
Section 3: Creating and Modifying Data in SPSS
Creating and Defining Variables
After data are in the Data Editor window, there are several things that you may want to do to describe your data. Before describing the process for defining variables, an important distinction should be made between two often confused terms: variable and value. A variable is a measure or classification scheme that can have several values. Values are the numbers or categorical classification representing individual instances of the variable being measured. For example, a variable could be created for job classification status. Each individual in the dataset would be assigned a value representing their job classification. For instance, we could assign custodians the value 1, clerks the value 2, and managers the value 3.
One reason to define information about your variables is to help you interpret the output. For example, if you have a variable representing employment categories that is coded as either 1, 2, or 3 for various employment categories, say, clerical custodial and managerial, it may be unwieldy to read the output if you are constantly trying to remember which number represents the which categories. One advantage of defining variables is that these values can be assigned labels that will appear in your output, thus making it much easier to interpret. Another aspect of defining variable information is to provide SPSS with information about the type of data in your dataset, which is often critical for SPSS to correctly process analyses.
You can define information about your variables by clicking the Variable Information tab. Doing so will bring the Variable Information sheet to the foreground. You can also access this sheet by double-clicking one of the gray boxes at the top of the columns in the Data Editor. The advantage of the second method is that it takes you to the row for the variable whose column head you clicked. Finally, you can also use the keystroke Ctrl+T to toggle between the windows. Regardless of the method you use, you will see a spreadsheet organized as the one below:
Many of the cells in the spreadsheet contain hidden dialog boxes that can be activated by clicking on a cell. If you see a gray box appear on the right side of the cell when you first click on the cell, this indicates that there is a hidden dialog box which can be accessed by clicking on that box. For example, clicking on the box in the cell for the Type column for the variable jobcat produces the following dialog box:
This box allows you to define the type of data for variables. For example, you will be presented with Numeric, String, and Date options among others. Thus, if you want to define the variable jobcat, a variable representing employment category as a string variable rather than the default variable type, numeric, you would click on the the cell in the jobcat row and the Type column, then click the gray box to produce the Variable Type dialog box. Here, you would choose the String option.
The Missing Values column allows you to define which values of a variable should be treated as missing data. The Label column is used to define labels for variables. The Values column is used to assign labels to the particular values of a variable. For example, the following dialog box shows a variable that has been assigned the values 1, 2, and 3 for the labels Clerical, Custodial, and Manager.
To define variables as shown above, you should first enter the value (e.g., 1) in the box labeled Value, then enter the label associated with that value (e.g., Clerical), and click on the Add button. Repeat this process for each value you want to label.
Inserting and Deleting Cases and Variables
You may want to add new variables or cases to an existing dataset. The Data Editor provides menu options that allow you to do that. For example, you may want to add data about participants' ages to an existing dataset. To insert a new variable, click on the variable name to select the column in which the variable is to be inserted. To insert a case, select the row in which the case is to be added by clicking on the row's number. Clicking on either the row's number or the column's name will result in that row or column being highlighted. Next, use the insert options available in the Data menu in the Data Editor:
(or Alt+D+V for Variables and Alt+D+I for Cases)
If a row has been selected, choose Insert Case from the Data menu; if a column has been selected, choose, Insert Variable. This will produce an empty row or column in the highlighted area of the Data Editor. Existing cases and variables will be shifted down or to the right, respectively.
You may want to delete cases or variables from a dataset. To do that, select a row or column by highlighting as described above. Next, use the Delete key to delete the highlighted area. Or you can use the Delete option in the Edit menu to do it.
Computing New Variables
You may want to modify the values of the variables in your datasets. For example, if a dataset contained employees' salaries in terms of their beginning and current salaries but you wanted the difference between starting salary and present salary, a new variable could be computed by subtracting the starting salary from the present salary. In other situations, you may also want to transform an existing variable. For example, if data were entered as months of experience and you wanted to analyze data in terms of years on the job, then you could recompute that variable to represent experience on the job in numbers of years by dividing number of months on the job by 12.
Both variables that are created as a numeric expression of existing variables and variables whose values are modified by an operation can be computed using the Compute option available from the menu in the Data Editor (or Alt+T+C):
This will result in the following dialog box:
To create a new variable, type its name in the box labeled Target Variable. Alternatively, you may want to modify the value of an existing variable, in which case you would type its name in the box labeled Target Variable. In both cases, the expression defining the variable being computed will appear in the box labeled Numeric Expression. This expression can either be typed into the box directly, or you can use the buttons located below the Numeric Expression box to input values or operators. The example shown above demonstrates the computation of a new variable. This new variable, salchng, will be the difference between an employee's current salary and the employee's beginning salary. The new variable will appear in the rightmost column of the working dataset.
Variables can also be computed conditionally. For instance, if, in the above example, you were only interested in the change in salaries for people who began working for the company within the last five years, you could create a condition that would compute a new variable only if an employee had begun employment within the last five years. To do this, first click on the button labeled If, which will produce the following dialog box:
First, click on the button labeled, Include if case satisfies condition to activate the gray areas of the dialog box. Then, specify the condition for computing a new variable in the input box at the top right of this dialog box. You can either type in the condition or click on variables in the variable list on the left side of the dialog box and use the buttons on the bottom middle of the dialog box. Variables can be moved to the conditional box by selecting by clicking on the variable's name, then clicking the arrow button between the two boxes. Clicking on the buttons on the bottom left of the dialog box will cause the character on the button to be displayed at the location of the cursor in the input box.
The above example illustrates the definition of a condition that requires cases to have less that five years (60 months) experience in order to be included in the computation of the new variable: the variable jobtime represents the number of months since an employee has been hired. Thus, only cases which have fewer than sixty months, or five years, since they were hired will be included. Click the Continue button to return to the previous dialog box.
You can also modify the values of existing variables in your dataset. For example, if a dataset contains a variable that classifies an employee's status in three categories, but for a particular analysis you want to combine two of these classifications into a single category, then two of the values would need to be recoded into a single value so that there are two total groups.
The Recode option (or Alt+T+R) is available from the menu in the Data Editor:
Additionally, there are two options for recoding variables in the Recode submenu. The Into Same Variables (Alt+T+R+S) option changes the values of the existing variables, whereas the Into Different Variables (Alt+T+R+D) option is used to create a new variable with the recoded values. Both options are essentially the same, except that recoding into a different variable requires you to supply a new variable name. You should use the Into Different Variables option, because you may change your mind about your recoding scheme at a later date. Thus, if your do change you mind, you still have the original values.
The following example illustrates the use of the Recode option to recode values into a new variable. When that option is selected from the menu, the following dialog box will appear:
First, a variable from the existing dataset should be selected by clicking on that variable, then clicking the arrow button in the middle of the dialog box. This will result in the selected variable being displayed in the box labeled, Numeric Variable -> Output Variable. Next, you must supply the name of the new variable, and optionally you can supply a label for the new variable. After a new variable name has been supplied, click on the button labeled Old and new Values. This will result in the following dialog box:
The above dialog box is the same regardless of whether you are recoding values into the same variable or creating a new variable. The original value of the variable being recoded is entered in the box labeled Old Value, and the new value is entered in the box labeled New Value. After values are entered in these boxes, click on the button labeled Add to complete the recode process.
Continuing with the above example, a variable with three values, such as jobcat, could be recoded into a variable with two values by recoding one of the values. In the example dataset, jobcat has three values: 1, 2, and 3. If the goal were to combine cases with the values 2 and 3, this could be accomplished by recoding cases with the value 3 into 2's. For example, by entering 3 in the box labeled Old Value and entering 2 in the box labeled New Value then clicking Add, all of the cases labeled 3 would take on the value 2. This can be repeated for as many of the values as necessary.
Values can also be recoded conditionally. The process for recoding values on the basis of a condition is essentially identical to the process for conditionally computing new variables discussed in the previous section: when you click on the If button in the main Recode dialog box, the same dialog box that was obtained from clicking If in the the Compute dialog box will appear with the same options.
Sorting cases allows you to organize rows of data in ascending or descending order on the basis of one or more variable. For example, the data could be sorted by job category so that all of the cases coded as job category 1 appear first in the dataset, followed by all of the cases that are labeled 2 and 3 respectively. The data could also be sorted by more than one variable. For example, within job category, cases could be listed in order of their salary. The Sort Cases (or Alt+ D+O) option is available under the Data menu item in the Data Editor:
The dialog box that results from selecting Sort Cases presents only a few options:
To choose whether the data are sorted in ascending or descending order, select the appropriate button. You must also specify on which variables the data are to be sorted. The hierarchy of such a sorting is determined by the order in which variables are entered in the Sort by box. Variables are sorted by the first variable entered, then the next variable is sorted within that first variable. For example, if jobcat was the first variable entered, followed by salary, the data would first be sorted by jobcat, then, within each of the job categories, data would be sorted by salary.
You can analyze a specific subset of your data by selecting only certain cases in which you are interested. For example, you may want to do a particular analysis on employees only if the employees have been with the company for greater than six years. This can done by using the Select Cases menu option, which will either temporarily or permanently remove cases you didn't want from the dataset. The Select Cases option (or Alt+D+C) is available under the Data menu item:
es">Selecting this menu item will produce the following dialog box. This box contains a list of the variables in the active data file on the left and several options for selecting cases on the right. D
Selecting one of these options will produce a second dialog box that prompts you for the particular specifications in which you are interested. For example, selecting the If condition is satisfied option and clicking on the If button (as was done in the example) results in a second dialog box, as shown below. The portion of the dialog box labeled Unselected Cases Are gives you the option of temporarily or permanently removing data from the dataset. The Filtered option will remove data from subsequent analyses until the All Cases option is reset, at which time all cases will again be active and used in further analyses. If the Deleted option is selected, the unselected cases will be removed from the working dataset. If the dataset is subsequently saved, these cases will be permanently deleted.
The above example selects all of the cases in the dataset that meet a specific criterion: employees that have worked at the company for greater than six years (72 months) will be selected. After this selection has been made, subsequent analyses will use only this subset of the data. If you have chosen the Filter option in the previous dialog box, SPSS will indicate the inactive cases in the Data Editor by placing a slash over the row number. To select the entire dataset again, return to the Select Cases dialog box and select the All Cases option.
You may sometime want to print a list of your cases and the values of variables associated with each case, or perhaps a list of only some of the cases and variables. For example, if you have two variables that you want to examine visually, but this cannot be done because they are at very different places in your dataset, you could generate a list of only these variables in the Output Viewer. The procedure for doing this cannot be performed using dialog boxes and is available only through command syntax. The syntax for generating a list of cases is shown in the Syntax Editor window below. The variable names shown in lower case below instruct SPSS which variables to list in the output. Or, you can type in the command ALL in place of variables names, which will produce a listing of all of the variables in the file. The subcommand /CASES FROM 1 TO 10, is an instruction to SPSS to print only the first ten cases. If this instruction were omitted, all cases would be listed in the output.
To execute this command, first highlight the selection by pressing on your mouse button while dragging the arrow across the command or commands that you want to execute. Next, click on the icon with the black, right-facing arrow on it. Or, you can choose a selection from the Run menu. Executing the command will print the list of variables, gender and minority in the above example, to the Output Viewer. The Output Viewer is the third window with which you should be familiar. It is the window in which all output will be printed. The Output Viewer is shown below, containing the text that would be generated from the above syntax.
To learn more about SPSS, proceed to the next SPSS tutorial.
13 September 2001
Statistical Support, a division of Research Consulting at ITS
Send us e-mail at email@example.com or submit a feedback form
Copyright 2003, UT Austin