Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

1. Cards and datalines are the same things.

After using both, beyond them data will be


considered.
2. To add comment: /* This is an example to add comments */
3. Code window is saved as: program.sas
4. Whereas, Data window is saved as: .sas7bdat
5. What is library? – Library is an alias/reference to a physical location.
6. Statement in SAS which run without run command – libname test
‘/folders/myshortcuts/Ashish’.
7. I can’t create the library with name WORK – as this is already created within the system and
I can’t make changes in this.
8. Var – passes on which the analysis has to be done.
9. The default dataset for the proc is the last dataset created and run by user.
10. Class is a statement used to separate the categorical data such as male, female etc. It can
accommodate both char and num
11. Var will accommodate var only.
12. Percentile can be operated only on qualitative data.
13. To convert Continuous data into ordinal ----- segmentation is used.
14. SET statement is just like save as this file with other name… Set State;
15. STRATA is of the form of ‘BY’
16. Outall: konsi value select hui hai or kon c nahi ye dikhane ke liye.. Jiase probability 0.5 pr
aadhe select hue or aadhe nahi.. to jo select hue unke aaghe 1 or jo nahi hue unke aaghe 0.
17. EDA- Exploratory Data Analysis
18. Quantitative – As your percentile is changing, data point will also change; otherwise date is
qualitative.
19. There is high probability that a data is quantitative if proc step is working on it.
20. Segmentation –
21. Is mutually exclusive – agar ek mei h to dusre mei nahi hoga.
22. Kuch proc hai jo data banate hai jaise – surveyselect (dataset bhi banata hai or output bhi)
and proc rank – but proc rank output data nahi banata..
23. Proc rank is based on the percentile ---p50 p25 p75 etc. (division into buckets)
24. Starting with quantitative data and ending with ordinal is the process of segmentation.
25. N and nmiss – to detect the missing value. (proc means data= A2 n nmiss) where n = no of
populated variable and nmiss = no of missing variable.
26. Stdize – used to treat the missing value.
27. Reponly - non stdize the data i.e it doesn’t change the data except missing ones.
28. Quantile – mixture of different percentile. It is symmetrical.
29. Hypothesis is a statement supported by facts.
30. 1 = Event; 0 = Non-Event.
31. Accuracy Formula =( A1+A2/A1+A2+E1+E2)*100 and
32. Error Formula = (E1+E2/ A1+A2+E1+E2)*100
33. Accuracy Rate for Events = (A1/A1+E1); also known as Sensitivity
34. Accuracy Rate for Non-Events = (A2/A2+E2); also known as Specificity. (Blue Stack – app to
use)
35. Hypothesis has 3 points – S1 – Formulation statement, S2 – Experimenting and S3 –
Analysing.
36. P>alpha = Acceptance. And P less then alpha = Reject
37. Never use “AA” where A stands for Alternate and second A stands for Accept
38. Reject Null Hypothesis and Failed to reject
39. Null hypothesis is the hypothesis of equality and alternate is the hypothesis of inequality.
40. If data and Alpha changes there are high chances than there is high chance that LDL and HDL
will also change.
41. Algorithms run top to bottom.
42. Assumption of Normal Distribution/P Alpha ki/univariate to run karne ke liye.
43. Independent Assumption is the first assumption and to check this we use nodupkey.
44. Normality Assumption is the second assumption
45. Confidence level identify karne ke liye proc mean use hoga jisme alpha define hoga.
46. One sample when group is one while comparing.
47. One sample two sided. Side as in the area of rejection on both side of LDL and HDL.
48. H0: U=U0
49. H1: U ne U0 and Alpha = 0.05 this combined is known as one sample two sided.

H0: U < 19200

H1: U >= 19200 ye one sided h q ki greater than and lower than hai.. but agar equal or not equal
hota to 2 sided hoti.

50. To perform 1 sided – first you need to do 2 sided tabhi hoga..


51. The average salary of individual is less than 19200 (jo one sample one sided hai) esko prove
karne ke liye ye pahle prove karna padega ki average salary is not equal to 19200 ye one
sample and two sided hoga.

1 sample 1 sided lower tail

H0: U>=50

H1: U<50

52. Actual value is diagonal and points are value jo standardize karne ke baad aayi hai.. agar
diagonal ke saath distributed h to normal distribution hai..
53. P is inversely proportional to alpha.
54. Jab bhi hum y=f(x) ki baat karte h to model use hota h var nahi..
55. Jab run or quit saath me likhe jate h to usko bolte h run quit group processing.
56. Jab model statement likha jayega tab run or quit saath mei likhna chahea. But Jayda tar proc
step mei nahi likhi jati.
57. Bulbwt ke liye H0 = U1=U2=U3=U4 and H0 = At least one is different.
58. Mean square model / mean square error = F value
59. When p is less then alpha then the variable is contributing and vice versa.
60. Adding variable add variability thereby increasing R^2(variables add karne pe variance kam
hoga or error ke chance kam ho jayenge)
61. Removing variable remove variability thereby decreasing R^2
62. Lsmeans = least square means = considers missing value unlike means.
63. Diffogram = tells differences of the proability.
proc univariate data=sashelp.class;

var height;

output out=mona pctlpts=67 pctlpre=P; /*pctlpre = percentile prefix is needed to run pctlpts*/

proc univariate data=sashelp.class;

var weight;

output out=mona pctlpts= 0 to 100 by 5 pctlpre=P; /* Use by - to run in loop; Now it will

give value of V1 V5 V10 V15 etc.; By default, the stopper is 1 that's why previously the

results were P1 P2 P3 P4 P5*/

run;

univariate ke saath output statement use karni hai .. or out bhi varna vo apne naam se ek output
file bana lega.

16-Nov-19

Closed Loop Card – Private brand card.

EDA – Exploratory Data Analysis ----- first phase jo hota hai.

ETL – Extraction Transformation Load ------ data transfer team karti -Hashing aspect jis se
original data na dikhe.

Check for information loss - It should be avoided. (SME – Subject Matter Expert) – Identify total
number of observation and variable and mail to them to check ki data bara hai..

SKU – Stock Keeping Unit – Property of the merchant and not the credit card company.

Project key points –

Sanity check

Static File – based on some demographic --- compil client --- transaction unique record nahi hota..

Last Seen – Recency

Transaction – Frequency

Spend – Monetary

During running means ‘n’ denotes number of populated observations.

While running percentile procedure --- if percentile and values both are changing then it’s a
quantitative otherwise it’s a qualitative.
Q3+1.5IQR es se Jayda and Q1-1.5IQR es se kam range mei honge to data mei outlier honge. IQR =
Inter quartile range.

Missing values are treated with threshold value.. ex – agar kisi ki salary group mei max 20 lakh h to
bahar se koi aata h jiski 20 cr h to usko 20 lakh pe layenge.

We treat the outlier, first, however we make sure that missing values are not touched or replaced
while treating the outlier.

IV – Information Variable and tells association between the variable. How much potential the
variable has to separate your goods from bads.

Correlation = strength of linear association.

Variable reduction mei p or alpha nahi aaya to contributing word ka use nahi karna

Proc cor association to btata hai lekin variable importance nahi bata pata.

Informat for date like --- 2019-12-16 23:59:52 ---- ANYDTDTM.

yymmn6. --- informat when date is like --- 201312

yymon-------- format

Multicolinearity ko padhna hai?

Data cleansing --- Data Prep ---- Variable reduction ----- Divide data into training and validation---
detecting multicolinearnity via logistic and check VIF--- jiska VIF 3 se Jayda hai usko nikal do---
variable reduction based on threshold IV---- area under the curve should be same--- sin c value
should be changed.

Local maxima = jo value diff mie max ho.. bucket check kar lo.. ye 3 mei lie karega..

Revenue of the product = 100 rs

Product costing = 10 rs

Campaign costing = 1 rs

You might also like