Sas Interview Questions

You might also like

Download as rtf, pdf, or txt
Download as rtf, pdf, or txt
You are on page 1of 15

SAS-INTERVIEW

QUESTIONS
1. What SAS statements would you code to read an external raw data
file to a DATA step?
Ans: Infile and Input statements are used to read
external raw data file to a Data Step.
2. How do you read in the variable that you need?
Ans: If we want to read a particular variable set of in a
SAS data set, we can mention the we want in variable
the INPUT statement.
3. Are you familiar with special input delimiters? How are they used?
Ans: Yes, we have special delimiters like DLM and DSD in SAS.
Both these delimiters can be used in the infile statement
The DLM can read the commas and spaces as
data delimiters. You may choose any delimiters you wish
with this option. You can choose multiple character such
as DLM=XX for your delimiter.
The DSD option allows you to treat two
consecutive delimiters as containing a missing value.

4. If reading a variable length file with would fixed input, how next
you prevent SAS from reading the last record if the
variable didnt have a value?
Ans: We can use MISS OVER option in the INFILE statement
5. What is the difference between an informat and a format?
Name three informat or format?
Ans: An informat is an instruction that SAS uses to read data
values into a variable
A format is an instruction that SAS uses to write data
values
The three informat are: -A)
Date informat
B) Character informat c)
Numeric informat
The three Formats are:-

A) Date format
B) Character Format
C) Numeric Format

6. Name and describe three SAS function that u have used, if any?
Ans:
A) SUM Function: It adds the variable together by
ignoring the missing values
if any
E.G: Var=SUM (var1, var2varn);
Var1= SUM (1,., 3) = 4
B) Mean Function: This function returns the arithmetic mean
(average) and ignores the missing value.
E.G: Var=MEAN (var1, var2, var3varn);
C) SUBSTR Function: The SUBSTR function extracts a portion
of the character data values based on how many
characters are designated for retrieval.
E.G: Var=SUBSTR (var, start<, number of characters);
Var1=SUBSTR (ASHOK, 1, 3)
In the above example the SUBSTR function takes String
ASHOK cuts from start-point (1) till number of Characters (3)
and stores ASH in Var1
7. How would you code the criteria to restrict the output to be
produced?
Ans: ods output close;
8. What is the purpose of trailing@? The @@? How would you use
them?
Ans: The trailing @ is also known as column pointer By using
the trailing@, in the INPUT statement gives you ability to
read a part of your raw data line, test it, and then decide
how to read additional data from the same record.
The single trailing @ tells the SAS system to hold the line.
The double Trailing @@ tells the SAS system to
Hold the line more strongly.
NOTE : An INPUT statement ending with @@ instructs the program to
release the current raw data line only when there are no

data values left to be read from that line. The @@, therefore, hold the
input record even across multiple iteration of the data step.
9. Under what circumstances would you code a SELECT construct IF
instead of statement?
Ans: Especially if you are recoding a variable into a large
number of categories.
10. What statement do you code to tell SAS that it is to write to
an external file?
Ans: Filename fileref path; File
fileref;
Put _all_
/* will write all the variables. */ Or put the
variables which you require.
11. If reading an external file to produce an external file, what
shortcut to write record without coding every single
variable on the record?
Ans: Put _all _
12. If you do not want any SAS output from a data step, how would
you code the data statement to prevent SAS from producing a
set?
Ans: By using DATA _NULL_ the desired output is a file a SAS
and not dataset.
13. What is the one statement to set the criteria of a data that can be
coded in any step?
Ans: Options statement
14. Have you ever-linked SAS code? If so, describe the like and any
required statement used to either process the code or the step
itself.
Ans : The link statement tells SAS to jump immediately To the
statement label that is indicated in the
Label statement and to continue executing
statements from that point until a RETURN
statement is executed. The RETURN statement
ends program control to the statement immediately
following the LINK statement.

Note: The LINK statement and the destination must be in


the same DATA step. The destination is identified
by a statement label in the LINK statement.
15. How would you include common or reuse code to be
Processed along with your statement?
Ans: By using %Include
16. When looking for the data contained in a character string of
150 bytes, which function is the best to locate that data:
scan, index or indexc?
Ans: Scan
17. If you have a data set that contains 100 you variables, but
need only five of those, what is the SAS to use code to force
only those variables?
Ans: Use keep = option;
18. Code a PROC SORT on a data set containing state, district
and country as the primary variable, along with several
numeric variables.
Ans:
PROC SORT data-set-name; BY
state district country; Run;
19. How would you delete duplicate observation?
Ans: There are three ways to delete duplicate a
observations in dataset
1) Proc sort data=SAS-data-set nodups; by
var;
run;
2) Proc sql;
Create sas-data-set as select * from
old_sas_data_set where var=distinct(var);
quit;
3)Data
clean;
Set temp;
By group;
If first.group and last.group then Run;

20. How would you code a merge that will keep only the
observati on that have matches form both sets?
Ans: By using the IN internal variable in the merge
statement.
DATA NEW;
MERGE ONE_TEMP (IN=ONE) TWO_TEMP
(IN=TWO); BY NAME;
IF ONE=1 AND TWO=1;
RUN;

21. What is the Program Data Vector (PDV)? What are their
functions?
Ans:
Program Data Vector is the temporary holding area. For
example
The WHERE statement is may be more
efficient then the sub setting If (especially if you are taking a very
small sunset from a large file) because it checks on the validity of
the condition to see if the observation is to be kept or not. This
temporary holding area is called the program data vector (PDV).
22. Does SAS Translate (compile) or does it Interpret? Explain.
Ans: When you submit a DATA step for execution, SAS checks the
syntax of the SAS statements and compiles them, that is,
automatically translates the statements into machine code. In
this phase, SAS identifies the type and length of each new
variable, and determines whether a type conversion is
necessary for each subsequent reference to a variable.

23. At compile time when a SAS data set is read, what items
are created?
Ans: At compile time SAS creates the following
A) Input Buffer
B) Program Data Vector(pdv) C)
Descriptor information
24. Name statements that are recognized at compile time

Only?
Ans: Drop Keep e.t.c
25. Identify statement whose placement in the DATA step is critical
Ans: Input Statement.
26. Name statements that function at both compile and
execution time.
27. Name statements that are execution only.
28. In the flow of the DATA step processing, what is the first action
in a typical DATA step?
Ans: SAS first performs Syntax check.
29. What is _n_?
Ans: This is nothing but a implicit variable created by SAS during
data processing. It gives the total number Of records SAS has
iterated in a dataset. It is Available only for data step and not
for procs.
E.G: If we want to find every third record in a Dataset
then we can use the _n_ as follows Data new-sasdata-set;
Set old;
If mod (_n_, 3) =1 then; Run;
Note: If we use a where clause to subset the _n_ not
Will yield the required result.

BASE
SAS:

30. What is the effect of the OPTION statement ERROR=1?


Ans: If the particular data step has one or more errors then end
the processing
31. Whats the difference between VAR A1 A4 and VAR A1--A4?

32. What do the SAS log messages numeric values have been
converted to character mean?
Ans: If we try some character function on the numeric values
the SAS will automatically convert the numeric variable
into character variable.
33. Why is a STOP statement needed for a POINT=option on a SET
statement?
Ans: Because POINT= reads only the specified observations, SAS
cannot detect an end-of-file condition as itwould if the file were being read
sequentially. Because detecting an end-of-file condition terminates a DATA
step automatically, failure to substitute another means of terminating the
DATA step when you use POINT= can cause the DATA step to go into a
continuous loop.
NOTE:
You cannot use the POINT= option with any of the
following:

BY statement
WHERE statement
WHERE= data set option
transport format data sets

sequential data sets (on tape or disk) relational


a table from another vendor's database
management system.

34. How do you control the number of observation and /or variable
read or write?
Ans: By specifying obs option
35. Approximately what date is represented by the SAS date value of
730?
Ans: 1 January 1962.
36. How would remove a format that has been permanently
associated with a variable.
Ans: By Using proc datasets library= somelibrary; Modify
sasdataset;
Run;
37. What does the RUN statement do?

Ans: The run statement executes the statement.


38. Why SAS considered self-documenting?
Ans: when a sas-data-set is created SAS creates the Descriptor
portion and the data portion of the Data set. The
descriptor portion contains the Details like when the
dataset was created, no. of Observations, no. of variables
e.t.c. Hence SAS is Considered self documenting.

39. Briefly describe 5 ways to do a table lookup in SAS.


Ans:
1) Simple table lookup (merging (merge (including
IN=OPTION) and sub setting IF statement)
2) Simple table lookup (formats (PROC FORMAT AND PUT
function).
3) Looking up with two variable (merging (merge setting
(including IN=OPTION) and sub IF statement)
4) Looking up with two variable ((formats (PROC
FORMAT, PUT AND INPUT Function)
5) A two-way Looking table (merge statement using two
variables).

40. What are some good SAS programming practices for


processing vary large data set?
Ans: For vary large data set with many variables we can of
make use arrays in the SAS systerm.
41. How would you create a data set with 1 observation and 30 from a
variables data set with 30 observations and 1
Variable?
Ans: Using Proc Transpose and also do with the sas arrays.

44. What are _numeric_ and _character_ and what do they do?

Ans: If we want to do a particular task for all the numeric variable


we can use the _numeric_ and same as if we want to do a
particular task for all the character variable we can use the
_character_
46. What is the order of application for output data set input
option, data set option and SAS statement?
Ans: INPUT data set option, SAS statement option and then
OUTPUT option.
47. What is the order of evaluation of the comparison ()?
operators: + - * /**
Missing Value:
56. How many missing values are available? When might you use them?
Ans: Two missing values are available in SAS, they are
numeric and character.
57. How do you test for missing values?
Ans: We can test the missing values by using NMISS in the
option input statement
58. How are numeric and character missing values represented
internally?
Ans: The numeric missing values represented as dots(.) and
the character missing values represented as blank
FUNCTION
S:
59. What is the significance of the OF in X=SUM (OF a1-a4, a6, a9);?
60. What do the PUT and INPUT function do?
Ans: The PUT function is used to identify the logic Problem
Which piece of code is executed and not executed what
the current value of the particular variable and what the
current value of the all variable.

INPUT function:
The traditional use is the reread a character variable with a execute
numeric format, a character-to-numeric conversion.
The character to numeric conversion function;
INPUT (variable, informat-name)
The INPUT function converts the character variable to numeric
Salary=input (EMP_SALARY, dollar7.);
Character value
EMP_SALARY
$85,000

Numeric value
SALARY
85000

Rename the assigning variable we cannot have the same name.


Like: EMP_SALARY=input (EMP_SALARY, dollar7.);
The numeric to character conversion function
PUT (variable, informat-name);
newphone=put (phone, 7);
numeric value
PHONE
6778000

character value
PHONE
6778000

61. Which date advances a date, time or date/time value by a


given interval?
62. What do the MOD and INT function do?
Ans: MOD function is very useful if suppose you want to select
every third observation from SAS data set. Example= data
third;
Set old;
If mod(_N_,3)=1;
Run;
The INT function retunes the integer portion of an
argument. To truncate a number (drop off the fractional
part), you use the INT function.

63. In ARRAY processing, what does the DIM function do?


Ans: DIM is the dimension function. This returns the length of the
array (i.e. the number of variable in the list).
64. How would you determine the number of missing or non-in
missing value computation?
Ans: We can use the N option for the number of NON-for
MISSING values and NMISS option the number of
MISSING values.
65. What is the difference between: X=a+b+c+d; and X=SUM (a, b, c,
d);?
Ans: If we use SUM (a, b, c, d) it will ignore the missing and
Values if any compute the sum.
For E.G SUM(1,.,2,3)=6 X=1+.
+2+3 = MISSING.
66. There is a field containing a date. It needs to be displayed in the
format ddmonyy if its before 1975,dd mon ccyy if its after
1985, and as disco years if its between 1975 and 1985. How would
you accomplish this in data step code? Using only PROC FORMAT.
67. In the following DATA step, what is needed for the
fractionto print to log
Ans:
data _null_; X=1/3; if
X=.333 then ; put
fraction; run;

68. What is the difference between calculating the mean using the
mean function and PROC MEANS?
Ans: The mean function returns the mean of the non-missing values
in the variable list. Actually, you may not have figured out the
importance of the way the MEAN function deals with the
missing values, and this is quit important .if you calculate
SCORE by simply

adding up all the item and dividing by 50 as follows


SCORE=(item1 +item2+item3+..+item50)/50;
You would be in big trouble if any of the items had missing
values. When SAS statement tries to do arithmetic operation
on missing values, the result is always missing.

PROC
s:
69. If you were given several SAS data sets you were
unfamiliar with, how would you find out the and variable names
formats of each dataset?
Ans: I can use the contents Procedure of all in the
libname and see all the variable each name and formats of
data set
EG:
PROC CONTENTS DATA=LIBREF._ALL_;
RUN;
70. How would you keep SAS from overlaying the SAS set with its
sorted version?
Ans: By creating a new dataset after sorting by specifying
Out = new sas dataset
71. In PROC PRINT, can you print only variable that begin with the letter
A
Ans: Yes we can print A variable which begin with the letter
by using the WHERE statement in the PROC PRINT
statement
WHERE
NAME) LIKE A%;
(VARIABLE Or
NAME =: A;
WHERE
(VARIABLE
72. What are some differences between PROC SUMMARY and PROC
MEANS?
Ans:
1) PROC MEANS produces subgroup statistics only when a BY
statement is used and the input data has been previously
sorted (use PROC SORT) by the BY variables.PROC
SUMMARY automatically produces

statistics for all subgroups, giving you all the information in


one run that you would get by repeatedly sorting a data set
by the variables that define each subgroup and running
PROC MEANS/.
2) PROC SUMMARY does not produce any information in your
output so you will always need to use the OUTPUT
statement to create a new data set and use PROC PRINT
to see the computed statistics.

PROC
FREQ:
73. Code the table statement for a single-level (most common)
frequency.
Ans
The statement for single-level.
DATA MAR.FREQTEST;
SET BAS.AMPERS;
PROC FREQ DATA
=MAR.FREQTEST; TABLE AGE;
RUN;
74. Code the table statement to produce a multi-level
frequency.
Ans:
The statement for multilevel.
DATA MAR.FREQTEST;
SET BAS.AMPERS;
PROC FREQ DATA
=MAR.FREQTEST; TABLE AGE *
gender;
RUN;
75. Name the option to produce a frequency line items rather a table.
that
76. Produce output from a frequency. Restrict the printing of the table.

PROC
MEANS:

77. Code a PROC MEANS that shows both summed and averaged
output of the data.
78. Code the option that will allow MEANS to include missing be
numeric data to included in the report.

79. Code the MEANS to produce output to be used later.


80. Do you use PROC REPORT or PROC TABULATE? Which do you
prefer? Explain.

MERGING/UPDATIN
G:
81. What happens in a one-on-one merge? When would you use one?
Ans:If you want to merge two data set that have different variable
and only one variable as a common variable with that unique
variable we can merge the data set with one-on-one merge.
82. How would you combine 3 or more tables with different
structures?
83. What is the problem with merging two data set that have
variable with the same name but different data?
Ans:The second data set value will overwrite the value of the
first data set.
84. When would you choose to MERGE two data sets together and SET
when would you two data sets?
Ans: If we want to create a dataset as an exact copy of The old
dataset without any bothering about which Dataset is going
to contribute to the new dataset Then we will use set
statement.
If we want to control the contribution of the old Datasets to
the new dataset then we will use the Merge statement
85. Which data set is the controlling data set in the MERGE
statement?
Ans: The second final dataset after the merge statement.

86. How do the IN= variable improve the capability of a MERGE?


Ans:
IN difference
is a implicit
SAS%GLOBAL?
which helps in
96. what
is the
betweenvariable
%LOCALinand
controlling which dataset the needs to contribute to
Ans: The
%LOCAL
new
datasetthat variable will be used only at the particular
block only but in case of the %GLOBAL that variable will be
87. Explain
message
ONE OR MORE DATASETS
usedthe
till the
end ofMERGE
the SASHAS
session
REPEATS WITH OF BY VARIABLE.
97. How long can a macro variable be? A token?
Ans: Till it passes
to the word scanner.
COSTOMIZED
REPORT
WRITING
98. If you use: a SYMPUT in a DATA step, when and where can you use the
88. What is the purpose of the statement DATA_NULL_?
macro variable?
Ans:
Ans: ItUse
canthe
bekeyword
used outside
the which
scope allows
of dataset
and will
_NULL_,
the power
of Be
theglobally
DATA
available.
step without creating a data set.
100. How would you code a macro statement to produce
information
on sign
the used
SAS log?
89. What
is the pound
for the DATA _NULL_?
Ans: %put Statement
90. What is the purpose of using the N=PS option?
Ans: Specifying N=PS in
the output pointer
current output

the FILE statement allows on


to write any line of the

MACR
O:
91. What system option would you use to help debug a macro?
Ans: Symbolgen Mlogic Mprint
92. Describe how you would create a macro variable?
Ans: %let var=value;
93. How do you identify a macro variable?
94. How do you define the end of a macro?
Ans:
%mend
95. How do you assign a macro variable to a SAS variable?
Ans: Using CallSymput

You might also like