Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 56

Frequently Asked Interview Questions(SAS)

Part I: Base SAS


Q1) What will be the respective values for A, B and C from the code given
below;

Data Test;
A = '25jan1960'd;
B = INTNX('month',A,0);
C = A+B;
Format A yymmdds10.
B yymmddd10.;
Run;
A) 1960/01/25, 1960.01.25, 24
B) 1960/01/25, 1960.01.01, 24
C) 1960/01/25, 1960-01-01, 24

D) 1960/01/25, 1960-01-25, 24
E) 1960/01/25, 1960-01-25, 25
F) None of the Above

Q2) Which of the options is correct for the code given below?

A)
B)
C)
D)
E)
F)

Data staff;
Jobcategory = FA;
Joblevel = 1;
Jobcategory = Jobcategory||Joblevel;
Run;
The code will fail because we are concatenating the same variable used to
store the result.
The code will work as expected.
The code will not fail but produce incorrect results.
The code will fail because we are concatenating numeric values with
character.
The code will concatenate and add spaces equal to the length of variable
Jobcategory.
None of the Above

Q3) Unless specified, which variables are used to calculate statistics in


the Means procedure?
A) All Variables in the data set
B) Non Missing Character and Numeric Variables
C) Missing and Non Missing Numeric Variables
D) Non Missing Numeric Variables
E) All of the above
F) None of the Above

Q4) Given the dataset: When we execute the Data Step as given, what
value does sum1 resolve to?

GROUP
A
B
C

Data _null_;
Set groups;
Call symput(sum1,sum);
Run;

SUM
100
200
300

A) 100
B) 200
C) 300

D) 100 200 300


E) 600
F) None of the Above

Q5) Consider the given code, which of the statements is false?


Data Test;
Input a @@;
If mod (a, 2) = 0 then b = lag (a);
Datalines;
12345
;
Run;
A) Test has 5 observations and incorrectly populated values for variable B.
B) Test has 5 observations and B has 3 Missing Values and last value of A for obs
2 and 4.
C) Lag Function doesnt work as expected when used with an If.
D) Lag Function will populate value from the last iteration.
E) Test has 5 observations
F) None of the Above

Q6) When the given data step is executed on the datasets A and B, What
is final outcome?
Code
Data List;
Set A;
Set B;
Run;

A)
B)
C)
D)
E)

List
List
List
List
List

Dataset A
Items
Books
Magazin
es
Pencils
Highlight
er
Scissors

has
has
has
has
has

Dataset B
Items
Scissors
Pens
Pencils
Paper
pads
Noteboo
ks
Magazin
es
Highlight
er
Books

8 observations with first 5 from A and last 3 from B


5 observations with all values from A
8 observations all values from B
5 observations with all values from B
13 observations with values from B appended below A

F) None of the Above

Q7) There are two Datasets available A and B with common variable X.

We need to merge the two datasets and keep values that do NOT exist in
both the datasets (Values that either exists in A or in B but not in both).
Please write the code to merge these two datasets.

Q8) Write the code to read a flat file so that only every fifth observation
is read into the dataset and total observations in the dataset should be
limited to 50 (assume the path and variable names)?

Q9) A data set has duplicate observations, how do you select duplicate
observations into a output dataset using proc sql (Give Code)?

Q10) A data set needs to be sorted and de-duped on records by two

variables A and B with A in Descending and B in Ascending Order and any


duplicate records need to saved in a separate dataset. Write the code.

Part II: SQL


Q11) Given: Table TBLORDER
CustID
OrderID
Amount
Paid
Balance
C0001
2290
500
500
0
C0001
2736
700
500
200
C0001
2458
700
700
0
C0001
2290
500
500
0
C0001
2291
700
700
0
If the following query is executed; how many records will be returned?
SELECT DISTINCT CUSTD, ORDERID, AMOUNT, PAID, BALANCE
FROM TBLORDER
WHERE AMOUNT NOT IN(100,200,300,400);
A) 1
B) 2
C) 3

D) 4
E) 5
F) None of the Above

Q12) How will you join the tables given below to produce the given
results?
Table A
Valu
e
100
200
300
400

Table B
Valu
e
300
400
500
600

A) Left Outer Join


B) Right Outer Join
C) Left Right Outer Join

Result
Valu
e
100
200
500
600
D) Left Right Inner Join
E) Inner Join
F) None of the Above

Q13) What will be Outcome of the given Query?

SELECT A.*
FROM A
WHERE EXISTS (SELECT * FROM B WHERE A.id = B.id)
A) The query will fail because the where clause has two tables without a join.
B) The query will filter all values in A with only that exist in B as well
C) The main query will select everything from A if the sub query returns at least
1 record
D) The main query will only run if the sub query returns values
E) The query will not fail but produce 0 records
F) None of the Above
Q14) A Table MasterBalance has 5 Mil Records with two columns
CustID and Balance. We have received a small table with 5000 records
with CustID and Balance that we have to use to update Balance in
MasterBalance for given CustIDs.
How can we achieve it; Give the code.

Q15) How will you join the tables given below to produce the given
results? Give Code.
Table A
Valu
e
100
200
300
400

Table B
Valu
e
300
400
500
600

Result
Valu
e
100
200
500
600

Part III: Macro SAS


Q16) What will be the output of the given code?
%LET A=B;
%LET B=C;
%LET C=&A;
%PUT &&&&&&A;
%PUT &&&&&&&A;
A) C, B
B) B, C
C) A, C

D) A, B
E) C, A
F) None of the Above

Q17) which of the following is false for %Include?


A) It is used to add a piece of sas code or text to current program from another
sas/text file
B) It is executed immediately
C) sas code is compiled in Macro Processor and the code remains in memory and
can be called later when needed in the program
D) It is a macro available with sas and can be used directly
E) All Data Steps/Procs are executed that are in the referenced external file.
F) None of the Above

Q18) Write a macro to print a Fibonacci series of n numbers.

Q19) How can you use sas functions within a macro?


A)
B)
C)
D)
E)
F)

We cannot use sas functions in macros. We have to use Macro functions.


By Using %SYSTEMFUNCTIONS
By Using %SASFUNC
BY Using %SYSFUNC
By Using %NONMACFUNC
None of the Above

Q20) How can you run a macro from within a datastep?


A)
B)
C)
D)
E)
F)

It is Impossible
By Using %RUNMACRO
By Using CALL SYMPUTX
BY Using CALL EXECUTE
By Using CALL RUNMACRO
None of the Above

What is the difference bet if and where clause?


What is the difference between Set and merge statements?
What is the difference between proc Summary and means?
What are the various Infile delimiter options?
Explain Scan function with examples.
What is the difference beteen Proc means and proc compare
Explain First last and its use.

data a;
input name $;

datalines;
123
233
5678
ass
fdgfg
3543
erxddfdf;
run;
/*select only numeric values from the dataset*/
proc sql;
create table x as
select input(name,5.)/1 as name1 from a;
quit;
/*Select only character values from the dataset*/
proc sql;
create table y as
select name, input(name,7.) as name1 from a where calculated name1 is null;
quit;

21) Difference between run and quit statement in sas.


Ans: Quit is used to stop processing when another environment is invoked. run will
tell sas to execute above statements.

22) What is SQLOBS automatic macro variable ?

it is an automatic macro variable used when proc SQL is used. It contains the no of
rows affected on each select or delete statement.

proc sql ;
2
3

%put >>> SQLOBS:&sqlobs ;

>>> SQLOBS:0
4
5

create table demo as

select * from sashelp.class

where sex EQ 'F' ;

NOTE: Table WORK.DEMO created, with 9 rows and 5 columns.

%put >>> SQLOBS:&sqlobs ;

>>> SQLOBS:9
9
10 delete from demo
11

where age EQ 11 ;

NOTE: 1 row was deleted from WORK.DEMO.

12 %put >>> SQLOBS:&sqlobs ;


>>> SQLOBS:1
13
14 reset nodouble ;
15 %put >>> SQLOBS:&sqlobs ;
>>> SQLOBS:0
16
17 delete from demo

18

where age EQ 12 ;

NOTE: 2 rows were deleted from WORK.DEMO.

19 %put >>> SQLOBS:&sqlobs ;


>>> SQLOBS:2
20
21 options ls=max ;
22 %put >>> SQLOBS:&sqlobs ;
>>> SQLOBS:2
23
24 quit ;
.................................................................................................

2) DATA A;
INPUT ID NAME $;
DATALINES;
1A
2B
3C
;
RUN;

DATA B;
INPUT ID SALARY;
DATALINES;
1 10

2 20
3 30
3 30
;
RUN;

DATA C;
MERGE A(IN=A)
B(IN=B);
BY ID;
OR B=1 */

/* WITH OUT GIVING (IN) CONDITIONS, BY DEFAULT IT WILL GIVE A=1

RUN;

OUTPUT ??

.................................................................................................

3) DATA A;
INPUT ID NAME $;
DATALINES;
1A
1 A1
1 A2
2 B1
2 B2
2 B3
;
RUN;

DATA B;
SET A;
BY ID;
VAR1= FIRST.ID;
VAR2 = LAST.ID;
RUN;

OUTPUT?? FIRST. AND LAST. GIVES BOOLEAN VALUES (1 AND 0)

..................................................................................................

4) %LET A= 2;
%LET VAR2= BARCLAYS;
%LET B = VAR;

Use a and b variables to print value Barclays.

&&&b.&a

............................................................................................

5) %let a=
%let b = &a;

barclays

%put &b;

23) Will variable b give value with leading space or without leading
space ?

Ans. Without space. %let by default removes spaces.

........................................................................................

6) DATA A;
INPUT ID NAME $;
DATALINES;
1A
2B
3C
;
RUN;

DATA B;
INPUT ID SALARY;
DATALINES;
1 10
2 20
4 40
;
RUN;

24) IN THE OUTPUT SHOW OBS WHOSE ID VAR VALUE ARE ONLY IN DATA
SET A NOT IN DATA SET B. (DON'T USE EXCEPT. USE JOINS) ?

ANS..

PROC SQL;
SELECT * FROM A LEFT JOIN B ON A.ID= B.ID WHERE A.ID NE B.ID;
QUIT;
.....................................................................................

25) Using proc sql read only first two obs from data set;

Ans. use inobs option.

8) data a;
input branch $ revenue dollar5. month $;
datalines;
a 1567 jan
a 4563 feb
a 2311 march
a 5555 april
a 6234 may
b 7123 jan
b 1890 feb
b 9044 march
b 1888 april
b 6120 may

;
run;
Ans
proc sql;
select 'Max revenue for branch',branch,'is', revenue ,'for the month of', month from
a
group by branch
having revenue= max(revenue);
quit;
9) DATA STUDS;
INFILE DATALINES DLM=',';
LENGTH NAME $ 20;
INPUT NAME $ ;
DATALINES;
ROHIT GUPTA
SHIV SHANKAR KUMAR
ADITI MATHUR
SUNIL KUMAR GUPTA
;
RUN;

DATA STUDS;
SET STUDS;
FIRST_NAME= SCAN(NAME,1,' ');
IF SCAN(NAME,3,' ') = ' ' THEN
LAST_NAME= SCAN(NAME,2,' ');
ELSE DO;
MIDDLE_NAME= SCAN(NAME,2,' ');

LAST_NAME= SCAN(NAME,3,' ');


END;
RUN;
What will be the output?
10) data a;
input account_holder $ status $;
datalines;
a

run;

data b;
set a;
length cone $9.;
retain cone;
if first.account_holder then cone=status;

else
cone= compress(cone||status);
by account_holder;
run;

proc print data=b;


run;

OUTPUT

Obs account_holder status cone

3c

3cc

3cc6

c5

c56

c56c

10

56

DATA A;
INPUT JOBID DATES DATETIME18. MONTH $;
DATALINES;
1 01MAR2013:12:23:12 march
1 01MAR2013:12:40:23 march
2 10MAR2013:10:12:34 march
2 10MAR2013:10:21:31 march
3 02APR2013:12:23:12 april
3 02APR2013:12:55:11 april
4 12APR2013:06:34:12 april
4 12APR2013:06:40:11 april
;
RUN;

/* in this dataset find job taking maximum execution time for each month
*/

Ans. Using Self Join

PROC SQL;
select * from (
select l.jobid, l.month as month,

r.dates-l.dates as Duration format=time10.


from a l , a r
where
l.jobid=r.jobid and
l.dates<r.dates) group by month having duration= max(duration);

QUIT;

/* METHOD WITHOUT SQL SELF JOIN */

proc sql;
create table bigJobs as
select month, jobId, jobHours
from
( select month, jobId, range(dates)/3600 as jobHours /* RANGE FUNCTION WILL
GIVE THE DIFFERENCE BETWEEN THE LARGEST AND THE SMALLEST VALUE AMONG
NON MISSING INTEGERS */
from a group by month, jobId )
group by month
having jobHours=max(jobHours);
select * from bigJobs;
quit;

........................................................................................

/* Find the freq using proc freq only for character variables in a data set */

proc freq data = sasuser.admit;


tables _character_ / list nocum nopercent;
output out= freq_test;
run;

......................................................................................

DATA ROHIT;
INPUT ID MONTH $ SALARY;
DATALINES;
1 JAN 500
1 FEB 700
1 MAR 1000
1 APR 1200
1 OCT 1100
2 FEB 250
2 DEC 500
;
RUN;

/* In above data set calculate sum of each id for each quarter */

PROC SQL;
SELECT ID,SUM(SALARY), CASE
WHEN MONTH IN('JAN','FEB','MAR') THEN 'QTR1'
WHEN MONTH IN('APR','MAY','JUN') THEN 'QTR2'
WHEN MONTH IN('JUL','AUG','SEP') THEN 'QTR3'
ELSE 'QTR4'
END AS QUART
FROM ROHIT GROUP BY ID,CALCULATED QUART;

QUIT;

...................................................................................

Q. Proc Summary options Missing and NWAY. Use of _type_ variable.

...................................................................

Q. Concepts of one to one merge and Match merge.


1).

The following program is submitted.

data test;
input name $ age;
cards;
John +35
;

run;
Which values are stored in the output data set?
A. name

age

--------------------John

35
B. name

age

--------------------John

(missing value)
C. name

age

--------------------(missing value) (missing value)


D. The DATA step fails execution due to data errors.

2).
The following observation is stored in a SAS data set named
EMPLOYEES:
LNAME

FNAME

JOBCODE

--------------------------------Whitley

Sam

na1

If the DATA step below is executed, what will be the value of the variable
JOBDESC in the output SAS data set when this observation is processed:
data navigate;
set employees;
if jobcode = 'NA1' then jobdesc = 'Navigator';
run;
A. navigator
B. Navigator
C. NAVIGATOR

D. a missing value

3).

The following SAS program is submitted:

proc format;
value score 1 - 50 = 'Fail'
51 - 100 = 'Pass';
run;
Which one of the following PRINT procedure steps correctly applies the
format?
A. proc print data = sasuser.class;
var test;
format test score;
run;
B. proc print data = sasuser.class;
var test;
format test score.;
run;
C. proc print data = sasuser.class format = score;
var test;
run;
D. proc print data = sasuser.class format = score.;
var test;
run;

4).

Given the following DATA step:

data loop;
x = 0;

do index = 1 to 5 by 2;
x = index ;
end;
run;
Upon completion of execution, what are the values of the variables X and
INDEX in the SAS data set named LOOP?
A. x = 3, index = 4
B. x = 3, index = 5
C. x = 5, index = 6
D. x = 5, index =7

5).
Given that the data set named ONE contains 10 observations and the
data set named TWO contains 10 observations, how many observations
will be contained in the data set named COMBINE that is created in the
following DATA step?
data combine;
set one two;
run;
A. 10
B. 20
C. 0, the DATA step will fail due to syntax errors
D. 10 to 20, depending on how many observations match

6).

What is the default length for the numeric variable Balance?

Name

Balance

Adams

105.73

Geller

107.89

Martinez
Noble

97.45
182.50

a. 5
b. 6
c. 7
d. 8

7).
In order for the date values 05May1955 and 04Mar2046 to be read
correctly, what value must the YEARCUTOFF= option have?
A. a value between 1947 and 1954, inclusive
B. 1955 or higher
C. 1946 or higher
D. any value

8).
Assuming you are using SAS code and not special SAS windows,
which one of the following statements is false?
A. LIBNAME statements can be stored with a SAS program to reference the
SAS library automatically when you submit the program.
B. When you delete a libref, SAS no longer has access to the files in the
library. However, the contents of the library still exist on your operating system.
C. Librefs can last from one SAS session to another.
D. You can access files that were created with other vendors' software by
submitting a LIBNAME statement.

9).

What usually happens when a syntax error is detected?


A. SAS continues processing the step.

B. SAS continues to process the step, and the SAS log displays messages
about the error.
C. SAS stops processing the step in which the error occurred, and the SAS log
displays messages about the error.
D. SAS stops processing the step in which the error occurred, and the Output
window displays messages about the error.

10). How can you tell whether you have specified an invalid option in a
SAS program?
a. A log message indicates an error in a statement that seems to be valid.
b. A log message indicates that an option is not valid or not recognized.
c. The message "PROC running" or "DATA step running" appears at the top of
the active window.
d. You can't tell until you view the output from the program.

11). Which of the following statements selects from a data set only those
observations for which the value of the variable Style is RANCH, SPLIT, or
TWOSTORY?
A. where style='RANCH' or 'SPLIT' or 'TWOSTORY';
B. where style in 'RANCH' or 'SPLIT' or 'TWOSTORY';
C. where style in (RANCH, SPLIT, TWOSTORY);
D. where style in ('RANCH','SPLIT','TWOSTORY');

12). 12. If you want to sort your data and create a temporary data set
named Calc to store the sorted data, which of the following steps should
you submit?
A. proc sort data=work.calc out=finance.dividend; run;
B. proc sort dividend out=calc; by account; run;
C. proc sort data=finance.dividend out=work.calc;. by account; run;
D. proc sort from finance.dividend to calc; by account; run;

13). Which statement identifies the name of a raw data file to be read
with the fileref Products and specifies that the DATA step read only
records 115?
A. infile products obs 15;
B. infile products obs=15;

C. input products obs=15;


D. input products 1-15;

14). Which statement correctly re-defines the values of the variable


Income as 100 percent higher?
A. income=income*1.00;
B. income=income+(income*2.00);
C. income=income*2;
D. income= *2;

15). Suppose you run a program that causes three DATA step errors.
What is the value of the automatic variable _ERROR_ when the observation
that contains the third error is processed?
a. 0
b. 1
c. 2
d. 3

16).

Which of the following actions occurs at the end of the DATA step?
A. The automatic variables _N_ and _ERROR_ are incremented by one.
B. The DATA step stops execution.
C. The descriptor portion of the data set is written.

D. The values of variables created in programming statements are re-set to


missing in the program data vector.

17).

How many characters can be used in a label?


a. 40
b. 96
c. 200

d. 256

18). The default statistics produced by the MEANS procedure are ncount, mean, minimum, maximum, and
A. Median.
B. Range.
C. Standard deviation.
D. Standard error of the mean.

19). Consider the IF-THEN statement shown below. When the statement is
executed, which expression is evaluated first?
if finlexam>=95 and (research='A' or (project='A' and present='A')) then
Grade='A+';
A. finlexam>=95
B. research='A'
C. project='A' and present='A'
D. research='A' or (project='A' and present='A')

20). Which of the following statements is false about BY-group


processing? When you use the BY statement with the SET statement,
A. the data sets that are listed in the SET statement must be indexed or
sorted by the values of the BY variable(s).
B. the DATA step automatically creates two variables, FIRST. and LAST., for
each variable in the BY statement.
C. FIRST. and LAST. identify the first and last observation in each BY group, in
that order.
D. FIRST. and LAST. are stored in the data set.

21). Which program will combine Brothers.One and Brothers.Two to


produce Brothers.Three?

a. data brothers.three;
set brothers.one;
set brothers.two;
run;
B. data brothers.three;
set brothers.one brothers.two;
run;
C. data brothers.three;
set brothers.one brothers.two;
by varx;
run;
D. data brothers.three;
merge brothers.one brothers.two;
by varx;
run;

22). The data sets Ensemble.Spring and Ensemble.Summer both contain


a variable named Blue. How do you prevent the values of the variable Blue
from being overwritten when you merge the two data sets?
a. data ensemble.merged;
merge ensemble.spring(in=blue)
ensemble.summer;
by fabric;
run;
B. data ensemble.merged;
merge ensemble.spring(out=blue)
ensemble.summer;

by fabric;
run;
C. data ensemble.merged;
merge ensemble.spring(blue=navy)
ensemble.summer;
by fabric;
run;
D. data ensemble.merged;
merge ensemble.spring(rename=(blue=navy))
ensemble.summer;
by fabric;
run;

Ques).
If you merge data sets Sales.Reps, Sales.Close, and
Sales.Bonus by ID, what is the value of Bonus in the third observation in
the new data set?

A. $4,000
B. $3,000
C. missing
D. can't tell from the information given

Ques.A typical value for the character variable Target is 123,456. Which
statement correctly converts the values of Target to numeric values when
creating the variable TargetNo?
A. TargetNo=input(target,comma6.);
B. TargetNo=input(target,comma7.);
C. TargetNo=put(target,comma6.);

D. TargetNo=put(target,comma7.);

Ques.Suppose you need to create the variable FullName by concatenating


the values of FirstName, which contains first names, and LastName, which
contains last names. What's the BEST way to remove extra blanks between
first names and last names?
A. data work.maillist; set retail.maillist; length FullName $ 40;
fullname=trim firstname||' '||lastname; run;
B. data work.maillist; set retail.maillist; length FullName $ 40;
fullname=trim(firstname)||' '||lastname; run;
C. data work.maillist; set retail.maillist; length FullName $ 40;
fullname=trim(firstname)||' '||trim(lastname);run;
D. data work.maillist; set retail.maillist; length FullName $ 40;
fullname=trim(firstname||' '||lastname); run;

Ques.The variable IDCode contains values such as 123FA and 321MB. The
fourth character identifies sex. How do you assign these character codes
to a new variable named Sex?
A. Sex=scan(idcode,4);
B. Sex=scan(idcode,4,1);
C. Sex=substr(idcode,4);
D. Sex=substr(idcode,4,1);
Ques.In the data set Work.Invest, what would be the stored value for
Year?
data work.invest;
do year=1990 to 2004;
Capital+5000;
capital+(capital*.10);
end;
run;

A. missing
B. 1990
C. 2004
D. 2005
Ques.For the program below, select an iterative DO statement to process
all elements in the contrib array.
data work.contrib;
array contrib{4} qtr1-qtr4;
...
contrib{i}=contrib{i}*1.25;
end;
run;
A. do i=4;
B. do i=1 to 4;
C. do until i=4;
D. do while i le 4;
Ques.There are 500 observations in the data set Company.USA. What is
the result of submitting the following program?
data work.getobs5(drop=obsnum);
obsnum=5;
set company.usa(keep=manager payroll) point=obsnum;
312
stop;
run;
A. an error
B. an empty data set
C. a continuous loop
D. a data set that contains one observation

Ques.Which function calculates the average of the variables Var1, Var2,


Var3, and Var4?
A. mean(var1,var4)
B. mean(var1-var4)
C. mean(of var1,var4)
D. mean(of var1-var4)

Ques. What are Macro Quoting Function and %sysfunc ?


ANS : there are three uses of macro quoting function
Masking macro trigger
ex: %let name=Johnson &Johnson;
Masking special characters
%let sun=50%age;
Masking unbalanced parenthesis & unbalanced quotes
%let restaurant =Karims caf;
We are having two types of macro quoting function:
Compile time : %STR , %NRSTR
Execution time : %QUOTE , %NRQUOTE , %BQUOTE , %NRBQUOTE , %SUPERQ

%sysfunc : this macro function can be used to convert all base sas function to
macro function.

Ques. How to create macro variable with Call Symput. Whats the scope of
macro variable created by this ?
ANS : Assume a dataset Test

Name

Sex

Age

Height

Weight

Alfred

14

69

112.5

Alice

13

56.5

84

Barbara

13

65.3

98

Carol

14

62.8

102.5

Henry

14

63.5

102.5

James

12

57.3

83

Jane

12

59.8

84.5

Janet

15

62.5

112.5

Jeffery

13

62.5

84

10

John

12

59

99.5

11

Joyce

11

51.3

50.5

12

Judy

14

64.4

90

13

Louice

12

56.3

77

14

Mary

15

66.5

112

15

Philip

16

72

150

16

Robert

12

64.8

128

17

Ronald

15

67

133

18

Thomas

11

57.5

85

19

William

15

66.5

112

Data _null_;
Set test;
Call symput (m_name,name);
Run;
%PUT &m_name.;
Result will be William (Last values of the variable from the SAS Dataset)

Ques. Use of %SUPERQ function


ANS : This is s Macro Quoting Function. This function is used when we have a macro
trigger within a macro name .

%macro test;
Data _null_;
Call symput(Company,Johnson &Johnson);
Run;
%PUT %SUPERQ(&company.);
%mend;
%test

%let a=6;

%let b=9;

%let c=a+b;

Ques. Then what will be the value for %PUT &a. &b. &c. ?
ANS : A will be 6, b will be 9 and C will be a+b

%let a=6;

%let b=9;

%let c=&a.;

Then what will be the value for &&&&&&a.

and

&&&&&&&b.

ANS : value for &&&&&&a. will be &6 and for &&&&&&&b. will be &9

How to connect SAS server with DB server or explain PASSTHROUGH ?


ANS :
PASSTHROUGH : It enables you to interact with DBMS by using SQL Syntax without
leaving your session. This canot be used in DATA STEP , this can be used only in
PROC SQL.
There are two types of PASSTHROUGH :
Implicit Passthrough : It will first run the SAS program or query on Database side if
the program executes without any error it pulls the data else it will first download

the DATASET from DB SERVER then it will run the query in SAS environment. Means
here query runs two times.
Explict Passthrough : In Explict passthrough we force the SAS Dataset to send the
program or query to run against DB SERVER . If it runs without any error query will
pull the data, else query will fail.

SAS Access
Database Server
SAS
Server
SAS Connect
C1

User 1

C2

user2

How to remove duplicate rows from a SAS dataset or use of nodup


,nodupkey , noduprcs and dupout ?
ANS : Proc sort data=class nodup dupout=class_new;
By sex; run;
Nodup : it will remove all duplicate rows ( check complete row)
Nodupkey : it will remove all duplicate rows by checking variable mention in by only.
Noduprcs : This is similar to nodup
Dupout : If we want to create a new data with duplicate rows then dupout is used.

How we can create variable using %let. What will be the scope of this
variable ?

ANS : %let a=20;


Macro variable a will be created with value 20.
Scope : (i) When we are using %let outside macro definition , macro variable will
always be created in GST
(ii ) When we are using %let inside a macro definition it will first check LST ,
if variable found it will update value, else it will moved to GST, checks value there if
variable found it will update value else it will come back to LST and macro variable
will be created in LST.

What will the value of date 25th March 1960 in SAS.


ANS : 01 Jan 1960 =0
30 days in Jan + 29 days in feb + 25 days in March = 84

How to replace missing value in a dataset with 0 ?


X1

X2

X3

X4

X5

3.2

4.5

5.3

4.2

6.2

4.5

4.5

7.8

ANS :
Data no_miss;
Set missing;
Array array_num(*) _num_;
Do i=1 to dim(array_num);
If array_num eq . then
array_num(i)=0;
end;

drop I;
run;

How to create macro variable with Call Symput. Whats the scope of macro
variable created by this ?
ANS : same as client round 1

What is DSD. What are its uses ?


ANS : DSD (Delimiter Sensitive Data ) is a option for the infile statement. It does
three things :
The DSD option assumes that the delimiter is comma.
If two delimiter are coming together in a row, then SAS treats it as a missing value.
It does not read quotation marks as a part of data value and ignores delimiters if its
enclosed in quotation marks.

Difference between set and merge ?


ANS :
Set

Merge

No By variable required

By variable required.

Sorted datasets are not required.

Datasets should be sorted before


merging.

One common variable required

No common variable required.

Set concatenates the datasets

Merge matches the observation of


dataset.

Difference between proc mean and proc summary ?

ANS : (i) Proc mean by default produces printed output in the output window,
wherewas proc summary does not print output in output window. By Default proc
summary creates a dataset.
(ii ) When we are not using VAR statement, Proc mean ignore character values and
give 5 default stat for all numeric values wherewas proc summay gives a simple
count of observation.

Use of Sacn function and what will be the size of variable created by scan
function.
ANS : SCAN function is typically used to extract word from a string. By default size
of variable created by SCAN is 200 bytes.

How to remove duplicate rows from a SAS dataset or use of nodup


,nodupkey , noduprcs and dupout ?
ANS : same as client round 1
What are uses of advance modifier &and : ?
ANS : :modifier It force the data to read from first non blank value until a delimiter
or end line occure ( it ignore variables size defined in input statement)
&modifier- It ignore until only one delimiter is coming, we will specify & for that
particular variable in input statement.

What is proc append ? What are its uses ?


ANS: Proc append is used when we want to update a large data set with a smaller
one. It will only add records but not process the larger dataset. We can use force
option if two datasets are not similar.
Proc Append Base= class (larger data set )data=class_new (smaller data set);
Run;

Describe the ways through which you can create macro variable ?
ANS :
Macro parameter passing

%let
%local
%global
Call symput & call symputx
Proc sql into clause
%do

What are LST and GST ? How they are created ?


ANS : LST - Local Symbol Stable is create by executing a macro program. A Local
Symbol Table store macro variable defined within that program.
GST- Global Symbol table is created when any new SAS session launched .
GST store the value of automatic macro variable.

How to control observation in final dataset while merging two dataset or


use of in while merging?
ANS : We can use in option to control observation in final dataset while merging two
dataset.
Example :
Data C;
Merge A(in=p) B(in=q);
By X;
If((p=1 and q=0) OR (p=0 and q=1)) then output;
Run;

Use of missover and truncover while treating missing values ?


ANS : Missover is used when we have delimited file while truncover is used for fixed
layout file.

How to find first 10 values for a variable in a dataset where values are
reapted

Or count first 10 value by using BY and counter.


ANS : use first.id and counter concept

Difference between SQL server and SAS ?


What is RDBMS ?
Difference between proc frequency and proc tabulate?
Rate yourself in SAS, ProcSQl , Macros and UNIX ?
Tell me about your project ?
Tell me about unix or some unix command ?

Important Notes on Some topics

SELF JOIN
/* FINDING THE MANAGER OF EMPLOYEES */

DATA SELF;
INPUT EMP_ID EMP_NAME $ MANAGER_ID ;

DATALINES;
1 ROHIT 4
2 PIYUSH 4
3 SAOOD 4
4 SHANTANU 9
5 ANSARI 8
6 RAHUL 8
7 MOHIT 4
8 RAJESH 10
9 PARIDHI 10
;
RUN;

PROC SQL;
SELECT 'MANAGER OF' ,A.EMP_NAME,'IS',B.EMP_NAME FROM SELF A, SELF B
WHERE A.MANAGER_ID = B.EMP_ID;
QUIT;

.................................................................................

/* In this data set select those kid ids which has both yellow and red candy. */

DATA TEST2;

INPUT KIDID $ CANDY $;


DATALINES;
K1 YELLOW
K1 RED
K2 RED
K2 BLUE
K3 WHITE
K3 RED
K3 YELLOW
;
RUN;

/* Method 1 */

proc sql;
select kidid from test2 where candy="YELLOW"
intersect
select kidid from test2 where candy="RED";
quit;

/* Method 2 */

proc sql;
select a.kidid from

test2 as a inner join test2 as b on a.kidid=b.kidid


where a.candy="YELLOW" and b.candy="RED";
quit;

FORMAT STYLES
Choosing an Input Style
The INPUT statement reads raw data from instream data lines or external files into a
SAS data set. You can use the following different input styles, depending on the
layout of data values in the records:
list input
column input
formatted input
named input.
You can also combine styles of input in a single INPUT statement. For details about
the styles of input, see the INPUT statement in SAS Language Reference: Dictionary.

1) List Input

List input uses a scanning method for locating data values. Data values are not
required to be aligned in columns but must be separated by at least one blank (or
other defined delimiter). List input requires only that you specify the variable names
and a dollar sign ($), if defining a character variable. You do not have to specify the
location of the data fields.

An example of list input follows:

data scores;
length name $ 12;

input name $ score1 score2;


datalines;
Riley 1132 1187
Henderson 1015 1102
;
List input has several restrictions on the type of data that it can read:
Input values must be separated by at least one blank (the default delimiter) or by
the delimiter specified with the DLM= or DLMSTR= option in the INFILE statement. If
you want SAS to read consecutive delimiters as if there is a missing value between
them, specify the DSD option in the INFILE statement.
Blanks cannot represent missing values. A real value, such as a period, must be
used instead.
To read and store a character input value longer than 8 bytes, define a variable's
length by using a LENGTH, INFORMAT, or ATTRIB statement before the INPUT
statement, or by using modified list input, which consists of an informat and the
colon modifier in the INPUT statement. See Modified List Input for more information.
Character values cannot contain embedded blanks when the file is delimited by
blanks.
Fields must be read in order.
Data must be in standard numeric or character format.
Note: Nonstandard numeric values, such as packed decimal data, must use the
formatted style of input. See Formatted Input for more information.

2) Modified List Input

A more flexible version of list input, called modified list input, includes format
modifiers. The following format modifiers enable you to use list input to read
nonstandard data by using SAS informats:
The & (ampersand) format modifier enables you to read character values that
contains one or more embedded blanks with list input and to specify a character
informat. SAS reads until it encounters two consecutive blanks, the defined length
of the variable, or the end of the input line, whichever comes first.

The : (colon) format modifier enables you to use list input but also to specify an
informat after a variable name, whether character or numeric. SAS reads until it
encounters a blank column, the defined length of the variable (character only), or
the end of the data line, whichever comes first.
The ~ (tilde) format modifier enables you to read and retain single quotation marks,
double quotation marks, and delimiters within character values.
The following is an example of the : and ~ format modifiers. You must use the DSD
option in the INFILE statement. Otherwise, the INPUT statement ignores the ~
format modifier.
data scores;
infile datalines dsd;
input Name : $9. Score1-Score3 Team ~ $25. Div $;
datalines;
Smith,12,22,46,"Green Hornets, Atlanta",AAA
Mitchel,23,19,25,"High Volts, Portland",AAA
Jones,09,17,54,"Vulcans, Las Vegas",AA
;

proc print data=scores noobs;


run;
Output from Example with Format Modifiers
Name

Score1

Score2

Score3

Team

Div

Smith

12

22

46

"Green Hornets, Atlanta"

Mitchel

23

19

25

"High Volts, Portland"

Jones

17

54

"Vulcans, Las Vegas"

AAA
AAA

AA

3) Column Input
Column input enables you to read standard data values that are aligned in columns
in the data records. Specify the variable name, followed by a dollar sign ($) if it is a

character variable, and specify the columns in which the data values are located in
each record:
data scores;
infile datalines truncover;
input name $ 1-12 score2 17-20 score1 27-30;
datalines;
Riley
Henderson

1132
1015

987
1102

;
Note: Use the TRUNCOVER option in the INFILE statement to ensure that SAS
handles data values of varying lengths appropriately.
To use column input, data values must be:
in the same field on all the input lines
in standard numeric or character form.
Note: You cannot use an informat with column input.
Features of column input include the following:
Character values can contain embedded blanks.
Character values can be from 1 to 32,767 characters long.
Placeholders, such as a single period (.), are not required for missing data.
Input values can be read in any order, regardless of their position in the record.
Values or parts of values can be reread.
Both leading and trailing blanks within the field are ignored.
Values do not need to be separated by blanks or other delimiters.

4) Formatted Input

Formatted input combines the flexibility of using informats with many of the
features of column input. By using formatted input, you can read nonstandard data
for which SAS requires additional instructions. Formatted input is typically used with
pointer controls that enable you to control the position of the input pointer in the
input buffer when you read data.
The INPUT statement in the following DATA step uses formatted input and pointer
controls. Note that $12. and COMMA5. are informats and +4 and +6 are column
pointer controls.
data scores;
input name $12. +4 score1 comma5. +6 score2 comma5.;
datalines;
Riley

1,132

Henderson

1,015

1,187
1,102

;
Note: You can also use informats to read data that is not aligned in columns. See
Modified List Input for more information.
Important points about formatted input are:
Characters values can contain embedded blanks.
Character values can be from 1 to 32,767 characters long.
Placeholders, such as a single period (.), are not required for missing data.
With the use of pointer controls to position the pointer, input values can be read in
any order, regardless of their positions in the record.
Values or parts of values can be reread.
Formatted input enables you to read data stored in nonstandard form, such as
packed decimal or numbers with commas.

5) Named Input
You can use named input to read records in which data values are preceded by the
name of the variable and an equal sign (=). The following INPUT statement reads
the data lines containing equal signs.
data games;
input name=$ score1= score2=;

datalines;
name=riley score1=1132 score2=1187
;

proc print data=games;


run;

SINGLE TRAILING TEXT


SALES.TXT

USA

10JUL10

$1500

INDIA

10/08/10

1,200

Here we can observe that date and sale fields are having different informats in row
one and two.
To process this type of records we need some processing to be done on the input
date and sale fields

data temp;
infile "D:\sales.txt";
input id country:$3. @;

if country eq 'USA' then


input date:date7. sale:dollar7.;

else
input date:ddmmyy8. sale:comma7.;
run;

proc print data = temp;


title;
run;

*************************************************************************************
AZHAR *************************************************************************************

Q1. What are the different infile options such as flowover, missover and
truncover.
Answer: there are mainly four type of infile options:
a.) Flowover it is default option while let the pointer to jump to next line in
case it encounter missing or value shorter than length.
b.) Missover by using this option , sas will set all variable to missing where
values is not available for those variable but it still jumps to next line to look
for the values in case value for the variable is shorter what we have declared
in input statement.
c.) Truncover - by using this option, sas will set all variable to missing where
values is not available for those variable as well as it will be able to read
variable values where values are of shorter length than the length it was
declared with.

Q2. If we are trying to create a new dataset final from dataset a (having
single variable var with 6 observation) and dataset b(also having single
variable var with 8 observation) using syntax data final;set a;set b;run; then
what will be the output and how?

Answer: The final dataset will be having 6 observation and the value will be
from second dataset. At time of the start processing first value from dataset
a will added in PDV of final dataset then first value of dataset b will override
its value and at sixth iteration sixth value of dataset a will get created and
after that sixth value of dataset b will override the value at sixth observation
in final dataset and after that end of the dataset marker will be encountered
and processing will be stopped.

Q3. In how many ways can we create macro's?


Answer: We can create macro by many ways some of which given below:
-%let
-%local
-%global
-do loop
-proc SQL
-call symput
-macro parameter

Q4. what is difference between pass through SQL and libname statement and
which type connection used in your company?
Answer: both types of engine provide access to database. pass through sql
directly passes the code to database while in libname statement sas
generates optimized database query. also in case of libname statement we
can direct choose to use data step or proc sql to process the data. we use
pass through type of connection in our company.

Q5. name few automatic macros.


Answer: sysdate9., sysday, systime, sysjobid etc.

Q6. how can we create macro variable using proc sql and what will the scope
that macro variable?
Answer: macro variable via proc sql can be created using code and scope of
the macro variable will be global:
proc sql;
select colname into :macronm
from tablename
;
quit;

Q7.difference between if and where condition?


Answer: Where condition applies to data before it enters the PDV and
therefore could not be applied to newly created variable in datastep and it is
considered to be fast. Whereas if condition applied to data after it comes out
from PDV.

Q8 what are the advantages of using dsd?


Answer:
-Make default delimiter to comma
-if it encounters two back to back commas then it will assign missing value
for that variable
-it will able to read values enclosed within quotation

*************************************************************************************
Interview Questions by ROHIT for (Saptrishi Team) on 26AUG-2014
*************************************************************************************

1) Overview of Work that have done in previous projects.


2) Tables structure used in prev team in Amex.
3) Ways to create macro variables.
4) Difference between %let and %local.
5) How do you connect SAS Env with Database ?
6) What is ODS ?
7) Scenario based example:
DATA AIRPORT;
INPUT TNO SNO DEPART: $10. ARRIVE: $10.;
DATALINES;
1 1 DELHI MUMBAI
1 2 MUMBAI BANGALORE
1 3 BANGALORE KOLKATA
1 4 KOLKATA CHENNAI
1 5 CHENNAI DELHI
2 1 BANGALORE MUMBAI
2 2 MUMBAI DELHI
2 3 DELHI ORISSA
;
RUN;
Output (required) data set:
Tno
1
2

Round_Trip
Y
N

Write a macro to produce output data set. If customer is returning back to


the airport where he/she has start journey it will be taken as Round Trip
irrespective of how many legs are there between the travels.

**********************************************************************
Interview Questions by ROHIT for (Sri Ram Team) on 26-AUG2014
*************************************************************************************

1. How much you rate yourself in Base SAS and Advanced SAS

2.
3.
4.
5.
6.
7.
8.

Proc Transpose Scenario


How to create graphs in Excel VBA
Ways to create macro variables
How to compare two datasets. Syntax of proc compare
Description of previous projects done.
Proc Import
Proc Contents

**********************************************************************
Interview Questions by ROHIT for (Basel Team) on 26-AUG-2014
*************************************************************************************

1. SAS infile option truncover, missover etc.


2. scenario based question on Set (combining difference datasets )
3. scope of macro variable
4. ways by which we can create the macro variable
5. program based on proc contents , datasets and macro
6. details about important projects in company
7. details about one of the key project and difficulties faced
8. how to create macro variable using proc sql
9. sql pass through connection and libname connection
10. how I used to connect in with sql in my company
11. programming questions in macro

*************************************************************************************
Q/A given by AZHAR (on 23-SEP-2014)
************************************************************************************

Q1. What are the different infile options such as flowover, missover and
truncover.

Answer: there are mainly four type of infile options:


a.) Flowover it is default option while let the pointer to jump to next line in
case it encounter missing or value shorter than length.
b.) Missover by using this option , sas will set all variable to missing where
values is not available for those variable but it still jumps to next line to look
for the values in case value for the variable is shorter what we have declared
in input statement.
c.) Truncover - by using this option, sas will set all variable to missing where
values is not available for those variable as well as it will be able to read
variable values where values are of shorter length than the length it was
declared with.
Q2. If we are trying to create a new dataset final from dataset a (having
single variable var with 6 observation) and dataset b(also having single
variable var with 8 observation) using syntax data final;set a;set b;run; then
what will be the output and how?
Answer: The final dataset will be having 6 observation and the value will be
from second dataset. At time of the start processing first value from dataset
a will added in PDV of final dataset then first value of dataset b will override
its value and at sixth iteration sixth value of dataset a will get created and
after that sixth value of dataset b will override the value at sixth observation
in final dataset and after that end of the dataset marker will be encountered
and processing will be stopped.
Q3. In how many ways can we create macro's?
Answer: We can create macro by many ways some of which given below:
-%let
-%local
-%global
-do loop
-proc SQL
-call symput
-macro parameter
Q4. what is difference between pass through SQL and libname statement and
which type connection used in your company?
Answer: both types of engine provide access to database. pass through sql
directly passes the code to database while in libname statement sas
generates optimized database query. also in case of libname statement we
can direct choose to use data step or proc sql to process the data. we use
pass through type of connection in our company.
Q5. name few automatic macros.
Answer: sysdate9., sysday, systime, sysjobid etc.
Q6. how can we create macro variable using proc sql and what will the scope
that macro variable?

Answer: macro variable via proc sql can be created using code and scope of
the macro variable will be global:
proc sql;
select colname into :macronm
from tablename
;
quit;
Q7.difference between if and where condition?
Answer: Where condition applies to data before it enters the PDV and
therefore could not be applied to newly created variable in datastep and it is
considered to be fast. whereas if condition applied to data after it comes out
from PDV.
Q8 what are the advantages of using dsd?
Answer:
-Make default delimeter to comma
-if it encounter two back to back comma then it will assign missing value for
that variable
-it will able to read values enclosed within quotation

You might also like