Professional Documents
Culture Documents
SQL Sas
SQL Sas
The two observations above were the only PROC SQL NOPRINT;
two situations where each data set in the CREATE TABLE newtable AS
MERGE statement contained common BY SELECT col1, col2, col3
variables, ZIP=10000 and ZIP=19000. FROM table1, table2
Specifying IF A AND B is telling SAS to keep WHERE some condition is satisfied;
observations that have by variables in both
the STATES (a.k.a. A) and CITYS (a.k.a. B) The first line executes PROC SQL. The
files. Similarly you could include default for PROC SQL is to automatically
observations with ZIPs in one file but not in print the entire query result. Use the
the other by saying either NOPRINT option to suppress the printing. If
you need the data from the query result in a
IF A AND NOT B; or data set for future processing you should
IF B AND NOT A; use the CREATE TABLE option as shown
on line 2 and specify the name of the new
Introduction to PROC SQL data set. Otherwise, the CREATE TABLE
statement is not necessary as long as you
A similar result is possible using PROC want the output to print automatically.
SQL. Before showing an example of how
this is done, a brief description of SQL and Lines 2 through 5 are all considered part of
how it works is necessary. SQL (Structured the SELECT statement. Notice that the
Query Language) is a very popular language semicolon only appears at the very end. In
used to create, update and retrieve data in the SELECT statement we define the
relational databases. Although this is a variables we wish to extract separated by
standardized language, there are many commas. If all variables are desired simply
different flavors of SQL. While most of the put an asterisk (*) after SELECT. Be careful
if the data sets you are joining contain the keyword is optional). Aliases are useful as a
same variable name. This will be explained shorthand way of selecting variables in your
more in the example to follow. The FROM query when several data sets contain the
clause tells which data sets to use in your same variable name. In our example both
query. The WHERE clause contains the data sets contain the variable ZIP. If we
conditions that determine which records are simply asked for the variable ZIP without
kept. The WHERE clause is not necessary specifying which data set to use, the print
for PROC SQL to work in general but as we out would show both ZIP variables with no
are using PROC SQL to replicate a merge it indication of which data set either one came
will always be necessary. Also note that the from. If we used a CREATE clause without
RUN statement is not needed for PROC specifying which ZIP to use, an error would
SQL. occur because SAS will not allow duplicate
variable names on the same data set.
The results from the merge in example 1B
can be replicated with PROC SQL by We could also code the SELECT statement
eliminating the two PROC SORT steps, the several different ways, with and without two
MERGE data set step the PROC PRINT level names and aliases as shown below.
step and replacing it with the following code:
• Two-level without aliases
Example 2A: SELECT STATES.ZIP,STATES.STATE,
PROC SQL; CITYS.CITY
SELECT A.ZIP, A.STATE, B.CITY
FROM STATES AS A, CITYS AS B • Two-level where necessary:
WHERE A.ZIP=B.ZIP; SELECT STATES.ZIP, STATE, CITY
Using PROC FORMAT in this manner is fine Replicating a MERGE with PROC
if you are using a static definition for your FORMAT using the PUT function.
conversion. Imagine now that you have data
from another survey but this time the answer Sometimes it may become necessary to
codes are totally different where 1 is create an entirely new variable based on the
“Excellent”, 2 is “Good”, 3 is “Fair” and 4 is values in your format. It may not be enough
“Poor”. You would have to re-code your to simply mask the real answer value with a
PROC FORMAT statement to reflect the temporary value such as showing the word
new conversion. Although it is not a lot of “YES” whenever the number 1 appears.
work to do here, it can become cumbersome You may need to create a new variable
if you have to do it every day. There is a whose permanent value is actually “YES”.
convenient way to lessen the maintenance. This can be done using the PUT function
along with a FORMAT. Showing how the
Using CNTLIN= option PUT function is used, we’ll stay with our
There is an option in PROC FORMAT called survey example. Our data sets look like
CNTLIN= that allows you to create a format this:
from an input control data set rather than a
VALUE statement. The data set must Survey Data: ANSWERS
contain three specially named variables that ID Q1
are recognized by PROC FORMAT: START, 111 1
LABEL and FMTNAME. The variable 222 2
named START would be the actual data 333 3
value you are converting from, e.g., 1, 2 or 3.
The LABEL variable would be the value you Format Data: ANSFMT
are converting to, e.g., YES, NO, MAYBE. START LABEL FMTNAME
The FMTNAME variable is a character string 1 YES ANSWER
variable that contains the name of the 2 NO ANSWER
format, e.g., ‘ANSWER’. You would simply 3 MAYBE ANSWER
read in the data set and appropriately assign
the variable names. Then run PROC
We will create a new variable using the PUT Conclusions
function that will be the value of the
transformed Q1 variable. The format of the As you can see, there are other alternatives
PUT function is: PUT(variable name,format). in SAS to using the traditional MERGE when
An example of the program code would look combining data set information. It is not too
like: difficult to learn a few beginner lines of
PROC SQL code to execute what most
DATA ANSWERS; merges do. The PROC FORMAT / CNTLIN
SET ANSWERS; / PUT method is also a quick way to get
Q1_A = PUT(Q1,ANSWER.); limited information from other data sets.
RUN; Hopefully you will try out these new methods
and expand you SAS knowledge in the
process.
The output data set would look like this:
References