Importing Excel Spreadsheets into SAS 9.2

I was using SAS 9.2 in a remote environment.  That is, I was working on my home computer, where I did not have SAS.  I had my data files on my home computer.  To use SAS, I was connecting with a university computer via some uncertain combination of software referred to as Citrix, vDesk, and VLab.  The situation would probably have been simpler if I had been attempting the import on campus, but I wasn’t.

Getting the Excel Data into SAS

As initial preparation, someone advised that it might be best to save that data in Excel 97-2003 format, because SAS was not yet up to the challenge of importing from Excel 2010.  That advice was more than a year old at this point, and may not have been accurate anymore.  SAS did seem willing to try to import an .xlsx file.  I decided to stick with the older format, just in case.

I didn’t keep detailed notes in the early phases, so some of my comments here are a bit jumbled.  I flailed around for several hours, looking at random sources of potential guidance.  These included the built-in SAS help system, many webpages, and SAS guides by Evans, someone at Cornell, and others.  They were all well-written and completely useless.  The problem was that I just needed straightforward instructions on getting that Excel data into SAS and working with it, and they were all assuming that I understood half of what they were saying before I started, and really only needed the other half that they happened to be providing.

Fortunately, even a completely clueless individual is apt to learn something, if only by accident, if s/he stays around long enough.  In this case, I learned that SAS for Windows 9.2 would allow me to import that Excel file in a place and form that I could actually use, once I mastered a bit of the SAS lingo.  My first step was informed by a completely intimidating call to a very nice man at SAS tech support.  He was patiently explaining things about libraries and commands and after about 30 seconds all I wanted was to get off the phone.  The primary reason was that I felt like I was bothering him.  No fault of his; I just felt stupid.  There were just too many things that I didn’t understand.  The secondary reason for wanting to get off the phone was that I had just encountered a plate of food that seemed much more interesting than whatever was happening onscreen.

The nice SAS man explained that I needed to start with the SAS File > Import Data process.  He didn’t put it in so many words, but I was able to infer as much from the half-hour I spent trying to figure out what he had said.  Later, I found what looked like a very helpful explanation of how to import an Excel file into SAS (and also another).  Fortunately, by then I had spent hours screwing around and had figured out part of the answer for myself (with a little help from my friends).

The gist of it was as follows:  if possible, eliminate empty rows and columns from the spreadsheet; save it as an .xls; close Excel to prevent conflicts; in SAS, go to File > Import Data and navigate to the .xls; complete the process of importing the file; and then save it.

At about this point, I made my first discovery.  It seemed that the university computer would quietly log me off after a while (apparently just a short time, for some reason — around 15 minutes), without any kind of pop-up warning.  To avoid that, I would have to be watching its webpage while working in SAS and Word and wherever else I might be researching, analyzing, or discussing material related to the data.

It appeared that, once the university logged me off, things would no longer function the same.  SAS would continue to perform calculations, but now it seemed that it could not find the data file that I had imported from Excel.  In addition, I could no longer copy and paste between SAS and other programs running on my home computer — such as Internet Explorer, in which I was writing these notes.  At those times, to get material between SAS and the other programs running on my home computer, I would have to copy and paste it into an email message from one environment (e.g., my ordinary home computer), and then send it to myself, and open that email in the other environment (e.g., inside the VLab virtual environment, which did include a web browser where I could access Hotmail).  One time, the fix for this seemed to be to log out of SAS, close down the virtual machine, and restart the VLab session.  But another time, my notes tell me, that did not work; I was thinking I might have to uninstall and reinstall SAS to get it to behave normally.  When the system was not functioning normally, sometimes I just used manual entry or cut-and-paste to get the raw data into the SAS code itself, and would use the CARDS command shown in my accompanying post to do calculations.

When I was able to import from an Excel spreadsheet, as noted above, I would start with the SAS menu pick for File > Import Data.  Within that process, “Workbook” was SAS-speak for the Excel file I wanted to import.  If I got a “Connection failed” error, that was probably because the file was still open in Excel.  Also, “Library” was SAS-speak for “folder,” and the “Member” was a code word meaning “filename.”  So when SAS presented me with blanks, I had to indicate the folder and filename that I wanted to use, to save that imported Excel file.

That led to another point of confusion.  It was not clear where SAS was saving these libraries and members (i.e., folders and files).  When it was functioning properly, the dialog that opened during the import process would allow me to see the files and folders on my home computer’s hard drive.  Sometimes it seemed that I had to Browse there using the Computer location; sometimes it seemed that I first had to get there using the Network location, perhaps with some additional exploration (perhaps through a Client location?), and that, after going through those steps, the program would be able to see stuff on my hard drive in the Computer location.  At other times, unfortunately, the remote SAS session behaved as though it were in its own world, completely divorced from my computer.  At those times, the best I could do was to shut it down and start over, as described above.

The WORK library was the default; it seemed to be a temporary folder whose contents would evaporate as soon as I shut down SAS.  An advantage of leaving the imported Excel data in the default WORK library, it seemed, was that SAS would let me refer to the filename without having to specify where it was.  If I wanted to keep the SAS file for more than one session, sometimes I tried creating a library (Tools > New Library) at a location of my own choosing, and then saving the Excel file into that; but at other times I tried saving it into the SASUSER library, wherever that was.

So that was what happened during most of the file import process.  The last step in the import process was that SAS would prompt me for the filename where I would save my PROC IMPORT statements.  I ignored that and just clicked Finish.  That gave me a NOTE in the Log window, telling me that my chosen filename data set was successfully created.  Also, in the Explorer window at the left side, I could click into Libraries > WORK (or whatever destination I had chosen) and see the newly imported Excel file sitting there.

Referring to the Imported Data

If I was manually copying and pasting (or typing) data directly into the SAS code, there was no need to refer to any other data file.  As noted above, I could just use the CARDS command.  But putting data into code seemed like a primitive and inflexible way of working.  I wanted to do better than that.  So I needed to learn how to make SAS use the data in my imported Excel file.  We weren’t using that approach in the class I was taking; it was just something I wanted to learn.  This urge struck at a time when the professor was involved in another project, so it was pretty much up to me to figure it out.  I did come to a solution, shown at the end; the following paragraphs describe the process.

I started by inserting a reference to my imported file into each PROC command in SAS.  So, for example, at one point, when I had saved the file in the SASUSER library, my PROC SORT command looked like this:

proc sort data=sasuser.Q1

where Q1 was the “Member” (i.e., file) name that I had given to the Excel file when importing it.  That worked, but it had a side effect.  The problem, according to an error message in the SAS log window, was that I needed to use an INFILE statement.  So I figured out how to include an INFILE statement, but that resulted in a different error:

ERROR: Physical file does not exist, C:\Users\rlwoodco\sasuser.Q1.
NOTE: The SAS System stopped processing this step because of errors.

As shown in the accompanying post’s description of an ANOVA, I was running several procedures, and for some reason SAS would go ahead and run most of them despite this error.  That is, I still got lots of output that looked basically OK, at least at a glance.

The “Physical file” error was puzzling.  SAS seemed to be looking for Q1 in a folder called C:\Users\[username].  If that’s what it wanted, that’s what it should have created when I imported Q1 from Excel.  But my computer did not have any such folder.  I guessed that it must be referring to drive C on the university computer, and I had no idea what to do about that.  A search led to a thread in which this message baffled seemingly experienced SAS users.

I got a sense that the dot in my “sasuser.Q1” location, used in an INFILE command, might be confusing SAS.  I thought it might help to just store Q1 in the WORK library, dispensing with the need to specify the SASUSER library.  (As noted above, WORK was the default and thus did not need to be specified.)  But no, that still produced the “Physical file does not exist” error if I used an INFILE line, and a “No DATALINES or INFILE statement” error if I didn’t.

I guessed the solution was to fix the INFILE statement somehow.  A webpage led me to understand that I would use INFILE to indicate the file being imported, and perhaps its delimiter (e.g., tab, space), and would then use INPUT to identify the format of the incoming data.  But that webpage didn’t specify the right approach for an Excel import.  I started another search for guidance, but then took another look at my Excel file.  Suddenly I saw something that might have made a difference.  The file didn’t consist solely of numbers.  On the first row, I had column headings.  Column A, containing zeroes for males and ones for females, also contained, on its first row, the word “Gender,” and similarly for the other columns.  Was SAS objecting because that first row was non-numerical data and my INPUT statement was failing to provide proper warning of that?  I removed that first row of headings, saved and closed Excel, and tried importing again.  Now I had “Physical file does not exist” plus new errors:  “Variable GENDER not found.”

I tried opening that Excel file and exporting its contents to one of the formats discussed in the webpage cited above.  For purposes of importing from a file, the options there included importing from a space-, comma-, and tab-delimited text files.  Excel offered the option of saving the spreadsheet in any of those formats.  I tried its Text (Tab delimited) (*.txt) option.  This gave me a text file that, when opened in Word, did seem to be using tabs to divide its data columns.  I imported that into the WORK library in SAS.  It seemed I should choose the “Tab Delimited File (*.txt)” option (i.e., not Excel) during the import process.  SAS would not let me delete the WORK > Q1 file — it believed that file was still in use — so I killed and restarted SAS.  The import process gave me a bunch of error messages, and said it had imported only 242 (not the full 243) records.  I was not able to figure out why.

The tab-delimited approach did not appear to be adding anything to the prior attempt to import an Excel file, so I reverted to Excel.  The Excel import process did not seem to have benefited from the deletion of that first row, in which I had provided names for the three columns of data in the spreadsheet, so I restored those labels in that first row.  To check these reversions, I tried again to import the Excel file with column headings.  There were no error messages during the import process.  I ran my SAS code on that newly created WORK > Q1 file.  I was back to having just the “Physical file does not exist” error, with all 243 records.  Importing an Excel file continued to seem to be a valid way to proceed.

The “Physical file does not exist” error seemed to stem from just these lines of code:

data anova1 ;
infile ‘Q1’ ;
input gender generation motivation ;

As before, the SAS Explorer window did show that WORK > Q1 existed.  I tried again with a variation:  infile ‘.work.Q1’.  This time, that produced no error messages.  Was it possible that the previous problem had involved my reference to the SASUSER library?  I tried re-importing Q1 into a new library, TEMPLIB.  To create the new library, I right-clicked in the SAS Explorer pane and chose New > Name as TEMPLIB.  I set the path to be a folder on my hard drive.  I left the other options at their defaults.  SAS said, “ERROR:  User does not have appropriate authorization level for library TEMPLIB.”  So, OK, I deleted that TEMPLIB from SAS Explorar and tried creating the new TEMPLIB library instead in the default location (named with my own name), apparently somewhere on the university computer (which, I now saw, was considered to be the very C:\Users\rlwoodco referred to in one of the foregoing error messages.  Possibly this was the only place other than WORK where I was allowed to create new libraries and members.  That gave me “TEMPLIB.Q1 data set was successfully created.”  I then proceeded to run the foregoing code with that change:  infile ‘.templib.Q1’.  Once again, the error:  “Physical file does not exist.”  Using Windows Explorer within the virtual environment, I navigated to drive C (specifically, vLab-VDI) and, there, to C:\Users\rlwoodco.  I saw that, indeed, there was no TEMPLIB file or folder there.  There was, however, a file called q1.sas7bdat.  I really had no idea what to make of that.  It did not appear that I had the option of using libraries other than WORK.

I killed SAS, deleted that q1.sas7bdat file from C:\Users\rlwoodco, and reimported the Excel file, this time to WORK.  I ran the three lines of code quoted above, with the simple reference to infile ‘Q1’.  That worked without error.  I noticed, this time, that the editor seemed to continue to run.  I clicked on the running man icon at the top of the screen to stop it.  Then I added three more lines.  The full set of commands being executed was now as follows:

data anova1 ;
infile‘Q1’ ;
input gender generation motivation ;
proc sort data=Q1 ;
by gender ;
run ;

When I ran that, I saw that I now had the “Physical file does not exist” error.  So possibly I had been misunderstanding that error message.  Apparently it was not the first three lines that triggered it, after all; apparently it was when I attempted to actually do something with the imported data (in this case, SORT) that I would encounter problems.  If so, that would be really confusing, because the SAS log proceeded to tell me that, in its view, there was no need for the SORT procedure, because “Input data set is already sorted.”

These adventures seemed to establish that SAS was still very much a work in progress — that if you wanted to use it, apparently it was your job to figure out what was wrong with it.  It did appear reasonable to continue to try to perform calculations in Excel to the extent possible.  A preliminary look suggested that Excel had, or could be tweaked to include, ANOVA options.  Excel and free alternatives were obviously a lot more available and less expensive than SAS, and were more likely to be something with which most students would be familiar.  I was not sure whether my assignment allowed me to use Excel, however, so I decided to continue to try SAS a bit more.

So, OK, SAS said that there was no such thing as C:\Users\rlwoodco\Q1.  I checked that in Windows Explorer.  SAS was correct.  The import process had seemed to say that such a file had been created, but apparently that was not true.  And yet the right-click Properties for Q1 in the SAS Explorer pane indicated that Q1 existed and had been created when I thought it was.  But now I had a new insight.  The Details tab, in that Properties dialog, indicated (when the dialog was maximized and the column widened) that Q1 had actually been created in C:\Users\rlwoodco\AppData\Local\Temp\SAS Temporary Files\_TD3564\q1.sas7bdat.  I think I speak for all baffled users when I ask, Why?  A look at that location in Windows Explorer (again, within the VLab virtual environment) indicated that there were also several other files at that location.

I realized that some of these difficulties might stem from the combination of SAS and VLab remote computing.  But it was not as though I were living an era in which remote computing had gone out of fashion, and it was reasonable to expect that many SAS users would prefer to work with the university’s copy of the program, rather than purchase their own.  It did appear incumbent upon SAS programmers to tell me where it was *really* putting the files, and to make it possible for me to use them at that location.

My instinct, at this point, was to stop using SAS.  Experience with other programs suggested that problems with coding in one area can be a hint of problems in other areas.  The black-box approach to computing — stick in the numbers and unquestioningly accept the output — had never been my preference.  I realized that one must trust computers to some extent, in a variety of situations; the question was just whether one could fairly assist on maximal assurances that the black box was trustworthy.

I decided to determine whether these problems were indeed due to remote computing.  The next time I was on campus, I tried again to run the same SAS procedures.  I got the same “Physical file” error.  At this point, fortunately, a knowledgeable individual suggested that the INFILE command was actually not necessary.  I don’t know why SAS had said that I needed that.  But he was right:  when I ran my full ANOVA code without those initial lines, there were no errors.  The solution came, then, not from manuals or help files, but from word of mouth.

This entry was posted in Uncategorized and tagged , , , , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s