Professional Documents
Culture Documents
Base SAS - Intrvw - Keys
Base SAS - Intrvw - Keys
step?
INFILE statement.
� Are you familiar with special input delimiters? How are they used?
DLM and DSD are the delimiters that I�ve used. They should be included in the
infile statement. Comma separated values files or CSV files are a common type of
file that can be used to read with the DSD option. DSD option treats two delimiters
in a row as MISSING value.
� If reading a variable length file with fixed input, how would you prevent SAS
from reading the next record if the last variable didn't have a value?
By using the option MISSOVER in the infile statement.If the input of some data
lines are shorter than others then we use TRUNCOVER option in the infile statement.
� What is the difference between an informat and a format? Name three informats or
formats.
� Name and describe three SAS functions that you have used, if any?
LENGTH: returns the length of an argument not counting the trailing blanks.(missing
values have a length of
1)Ex:
a=�my cat�;
x=LENGTH(a);
Result: x=6�
data dsn;
A=�(916)734-6241�;
X=SUBSTR(a,2,3);
RESULT: x=�916� ;
run;
TRIM: removes trailing blanks from character expression.
Ex: a=�my �; b=�cat�;X= TRIM(a)(b); RESULT: x=�mycat�.
� How would you code the criteria to restrict the output to be produced?
Use NOPRINT option.
� What is the purpose of the trailing @ and the @@? How would you use them?
@ holds the value past the data step.@@ holds the value till a input statement or
end of the line.
Double trailing @@: When you have multiple observations per line of raw data, we
should use double trailing signs (@@) at the end of the INPUT statement. The line
hold specifies like a stop sign telling SAS, �stop, hold that line of raw data�.
ex:
data dsn;
The above program can be changed to make the program shorter using @@ ....
data dsn;
input sex $ days @@;
cards;
F 53 F 56 F 60 F 60 F 78 F 87 F 102 F 117 F 134 F 160 F 277M 46 M 52 M 58 M 59 M 77
M 78 M 80 M 81 M 84 M 103 M 114M 115 M 133 M 134 M 175 M 175
;
run;
SELECT GROUP:
Select: begins with select group.When: identifies SAS statements that are executed
when a particular condition is true.
Otherwise (optional): specifies a statement to be executed if no WHEN condition is
met.
End: ends a SELECT group.
�What statement you code to tell SAS that it is to write to an external file?
� If you're not wanting any SAS output from a data step, how would you code the
data statement to prevent SAS from producing a set?
Data _Null_
� What is the one statement to set the criteria of data that can be coded in any
step?
Options statement: This a part of SAS program and effects all steps that follow it.
� Have you ever linked SAS code? If so, describe the link and any required
statements used to either process the code or the step itself
.� How would you include common or reuse code to be processed along with your
statements?
By using SAS Macros.
� When looking for data contained in a character string of 150 bytes, which
function is the best to locate that data: scan, index, or indexc?
SCAN.� If you have a data set that contains 100 variables, but you need only five
of those,
� Code a PROC SORT on a data set containing State, District and County as the
primary variables, along with several numeric variables.
� How would you code a merge that will keep only the observations that have matches
from both sets.
Check the condition by using If statement in the Merge statement while merging
datasets.
� How would you code a merge that will write the matches of both to one data set,
the non-matches from the left-most data.
Ex:
data xxx;
merge yyy(in = inxxx) zzz (in = inzzz);
by aaa;
if inxxx = 1 and inyyy = 1;
run;
� What is the Program Data Vector (PDV)? What are its functions?
Function: To store the current obs;PDV (Program Data Vector) is a logical area in
memory where SAS creates a dataset one observation at a time. When SAS processes a
data step it has two phases. Compilation phase and execution phase. During the
compilation phase the input buffer is created to hold a record from external file.
After input buffer is created the PDV is created. The PDV is the area of memory
where SAS builds dataset, one observation at a time. The PDV contains two automatic
variables _N_ and _ERROR_.
The Logical Program Data Vector (PDV) is a set of buffers that includes all
variables referenced either explicitly or implicitly in the DATA step. It is
created at compile time, then used at execution time as the location where the
working values of variables are stored as they are processed by the DATA step
program(source: http://www2.sas.com/proceedings/sugi24/Posters/p235-24.pdf).
� In the flow of DATA step processing, what is the first action in a typical DATA
Step?
The DATA step begins with a DATA statement. Each time the DATA statement executes,
a new iteration of the DATA step begins, and the _N_ automatic variable is
incremented by 1.
� What is _n_?
It is a Data counter variable in SAS.
Note: Both -N- and _ERROR_ variables are always available to you in the data step
.�N- indicates the number of times SAS has looped through the data step.This is not
necessarily equal to the observation number, since a simple sub setting IF
statement can change the relationship between Observation number and the number of
iterations of the data step.The �ERROR- variable ha a value of 1 if there is a
error in the data for that observation and 0 if it is not. Ex: This is nothing but
a implicit variable created by SAS during data processing. It gives the total
number of records SAS has iterated in a dataset. It is Available only for data step
and not for PROCS. Eg. If we want to find every third record in a Dataset thenwe
can use the _n_ as follows
Data new-sas-data-set;
Set old;
if mod(_n_,3)= 1 then;
run;
Note: If we use a where clause to subset the _n_ will not yield the required
result.
How can I determine the position of the nth word within a character string?
use a combination of the YEAR. and MMDDYY. formats to simply display the value:
put sasdate year4. sasdate mmddyy4.;
or use a combination of the PUT and COMPRESS functions to store the value:
newvar = compress(put(sasdate,yymmdd10.),'/');
How can I put my sas time variable with a leading zero for hours 1-9?
Use a combination of the Z. and MMSS. formats:
hrprint = hour(sastime);
put hrprint z2. ':' sastime mmss5.;
INFILE OPTIONS
Prepared by Sreeja E V(sreeja@kreara.com) source: kreara.blogspot.com.
Infile has a number of options available.
FLOWOVER
FLOWOVER is the default option on INFILE statement. Here, when the INPUT statement
reaches the end of non-blank characters without having filled all variables, a new
line is read into the Input Buffer and INPUT attempts to fill the rest of the
variables starting from column one. The next time an INPUT statement is executed, a
new line is brought into the Input Buffer.
Consider the following text file containing three variables id, type and amount.
11101 A
11102 A 100
11103 B 43
11104 C
11105 C 67
The following SAS code uses the flowover option which reads the next non missing
values for missing variables.
data B;
infile "External file" flowover;
input id $ type $ amount;
run;
MISSOVERWhen INPUT reads a short line, MISSOVER option on INFILE statement does not
allow it to move to the next line. MISSOVER option sets all the variables without
values to missing.
data B;
infile "External file" missover;
input id $ type $ amount;
run;
Both will assign missing values to variables if the data line ends before the
variable�s field starts. But when the data line ends in the middle of a variable
field, TRUNCOVER will take as much as is there, whereas MISSOVER will assign the
variable a missing value.
Consider the text file below containing a character variable chr.
a
bb
ccc
dddd
eeeee
ffffff
Consider the following SAS code
data trun;
infile "External file" truncover;
input chr $3. ;
run;