Download as pdf or txt
Download as pdf or txt
You are on page 1of 18


Program Name: DATA sclENcE (csDs21)

Resource Person L


DESIGNATION : Assistant Professor

Department : Computer Science

College : Arignar Anna Govt. Arts College, Cheyyar

Phone Number : 9489917092

Emailld :

Qualification : M.Sc.,M.Phit

Experience : 12 years

Address : 27, dhrowbathiamman nagar, koda nagar

Cheyyar -6044W.



Data science syllabus

Module 1: Python

Flhon is the most important and irecessary topic that every data scientist should have knowledge
about. In this section, our instructors will take you through the basics of Python and areas where
it can be used. You will learn how to use some of the current tools such as Numpy, Pandas, and
Matplotlib. Therefore, module I includes -
. Environment set-up
. Jupyter overview
. Plthon Numpy
. Python Pandas
. Plthon Matplotlib

Module 2: R

Used for statistical and data analysis, R programming language is one of the advanced statistical
languages used in data science. This module teaches you how to explore data sets using R. Here
you will learn -
. An introduction to R
. Data structures in R
. Data visualization with R
. Data analysis with R

Module 3: Statistics

When working with data, the knowledge of statistics is necessary and an important skill set that
you must have. In this module, you will learn - '

. Important statistical concepts used in data scienbe

. Difference between population and sample
. Types of variables
. Measures of central tendency
. Measures qf variability.
. Coefficient of variance
. Skewness and Kurtosis

Module 4: Inferentiul statistics

Inferential statistics is used to make generulizations of populations, from'which samples are

drawn. This is a new branch of statistics, which helps you leam to analyze representative
samples of large data sets. In this module, you will learn -

a Normal distribution

6 -fi;
. Test hypotheses
. Central limit theorem
. Confidence interval
. T-test
. Type I and II errors
. Student's T distribution

Module 5: Regression and Anova

This lesson will help you understand how to establish a relationship between two or more
objects. ANOVA oi analysis of variance is used to analyze the differences among sample sets.
Here you will learn -

. Regression
. R square
. Correlation and causation

Madule 6: Exploratory data analYsis

In this lesson you will learn -

. Data visualization
. Missing value analysis
. The correction matrix
. Outlier detection analysis

Module 7: Supervised machine learning

This is a comprehensive module to help you understand how to make machines or computers
interpret human language. You will learn -

. Python Scikit tool

. Neural networks
. Support vector machine
. Logistic and linear regression
. Decision tree classifier

Module 8: Tableau

Tableau is a sophisticated business intelligence tool usqd for data visualization. In this lesson,
you will learn -
. Working with Tableau
. Deep diving with data and
. Creating charts

Ptr; e??fl06
I n Collegle'
* CheYYar'6Q4 4%,
'. Mappingdata in Tableau
Dashboards and stories

Module 9: Machine learning on cloud

In this lesson, you will learn -

. . ML on Cloud platform
. ML on AWS
. ML on Microsoft Azure
experience and knowledge of
Each of these lessons are taught by instructors who have years of
data science and pnalyti.r. W. guarantee you one-to-one
mentoring, and glso support you with
assessments and interviews towards the end of the session

Data science skills that you will master from this course

professionals to develop analytical

Besides data science skills, this course enables freshers and
and leadership skills. Additional skills that you will
gain from this online course are-
. Learn new programming languages
. Learn to use frameworki basid on tools like Hadoop and Apache
. Learn about NLP and neural networks
. Gain hands-on experience.on AI and machine learning tools
languages' such
. Know all about pyrt o, and various forms of tools used in programming
as Python NumPY and Pandas
. Learn how to use statistical models
data sets using
. Know about exploratory data analysis and learn to measure and analyze
visual patterns. . machine learning

. Obtain one-to-one experience from instructors on supervised

. Dlvelop leadership skills and understand how to make business decisions
. Undersiand data analytics and metrics imp.ortant for any business
. Critical thinking and decision-making skills

E t 5
() PrinciPal'
e<( f'l -/
g c-{

oNl i

Department of Computer Science
Academic Y ear :2021-2022
Course Schedule - CSDS2l
30 Hours Training Schedule

speaker: Mrs. s.suMATHr, ASST. PROFESSOR, AAA coLLGE,cHEyyAR

Day & Date Session &Time . Session Topics

Module 1: Python -
Environment set-up
Jupyter overview
Python Numpy
Module 2:
07.09.2021 3.45 pm to 4.45 pm R - An introduction to R
Data structures in R
Module 3:
08.09.2021 3.45 pm to 4.45 pm Statistics - Important statistical concepts used in data science
Difference between population and sample
Types of variables
3.45 pm to 4.45 pm
Measures of central tendency

10.09.2021 3.45 pmto 4.45 pm Measures of variability

Coefficient of variance
Module 4:
Inferential statistics
11 .09.2021 Normal distribution
Test hypotheses
Central limit theorem

13.09.2021 3.45 pm to 4.45 pm Type I and II erors
Student's T distribution
Module 5:
Regression and Anova
14.09.2021 3.45 pm to 4.45 pm Regression

Module 6:
Exploratory data analysis
Data visualization
Missing value analysis

16.09.202r 3.45 Module 7: Supervised machine learning

(\'' P
+ 1"$s;l' 4N
' 17.09.2021 3.45 prn to 4.45 pm Python Scikit tool

Neural networks
Support Vector machine
18.09.2021 10.00 am to 3.45 pm
T.ogistic and linear regression
Decision tree classifier

Module 8:
20.09.2021 3.45 pmto 4.45 pm Tableau
Working with Tableau
2t.09.2021 3.45 pm to 4.45 pm Deep diving with data and connection

22.09.2021 3.45 pmto 4.45 pm Creating charts

23.09.2021 3.45 pm to 4.45 pm Mapping data in Tableau

24.09.2021 3.45 pm to 4.45 pm Dashboards and stories

Module 9: Machine learning on cloud
25.09.2021 10.00 am to 3.45 pm ML on cloud platform

t ,
At--t h*t-t""-*:> {.
* "udfi,#b**

ACADEMIC YEAR: 2021-2022

1. which of the following is the most important language for Data Science?
A Java B Ruby CR D None of the mentioned
2. Which of the following approach should be used to ask Data Analysis question?
A Find only one solution for particular problem
B Find out the question which is to be answered
C Find out answer from dataset without aiking question
D None of the mentioned
3. Which of the following is one of the key data science skills?
A Statistics B Machine Learning C Data Visualization D AII of the mentioned
4. Which of the following is characteristic of Processed Data?
A Data is not ready for analysis B All steps should be noted
C Hard io rr. for data analjrsis D None of the mentioned
5. The plot method on Series and DataFrame is just a simple wrapper around
A gplt.plot0 B plt.plotO C plt.plotgraph0 D none of the mentionbd
6. Which of the following value is provided by kind keyword for barplot?
A bar B kde c hexbin D none of the mentioned
7. Which of the following is the probability calculus of beliefs, given that beliefs follow certain
A Bayesian probability B Frequency probability
C Frequency inference . D Bayesian inference
8. Which of the following random variable that take on only a countable number of
A Discrete B Non Discrete C Continuous D All of the mentioned
9. Which of the following is also referred to as random variable?
A stochast B aleatory C eliette D all of the mentioned
'10. Which of the following function is aSsociated with a continuous random variable?
A pdf B pmv C pmf D all of the mentioned
I l. Which of the following value is the most common measure of "statistical significance"?
AP BA CL D All of the mentioned
12. What is the purpose of multiple testing in statistical inference?
A Minimize effors B Minimize false positives
C Minimize false negatives D Alt of the mentioned
13. Which of the following tool is used for constructing confidence intervals and calculating
standard errors for difficult statistics?
A baggyer B bootstrap C jackknife D none of the mentioned

c or
n Colle$en
"'n ri r"r-Arfl at

+ CHE ii ur-Y ' 604 4O7e'

14. Which of tht following characteristic of big data is relatively more concerned to data
A Velocity B Variety C Volume D None of the
15. Which of the following focuses on the discovery of (previously) unknown properties on the
A Data miningB Big Data CDatawrangling D Machine Learning
16. Whicir of the following uses relatively small amount of data to estimate about bigger
A Inferential B Exploratory C Causal D None of the
. mentioned '
17. Which of the following analysis is usually modeled by deterministic set of equations?
A Predictive B Causal C Mechanistic D All of the mentioned
18. Which of the following is the top most important thing in data science?
A answer B question C data D none of the mentioned
19. Which of the following approach should be used if you can't fix the variable?
A randomize it B non stratify it C generalize it D none of the mentioned
20. Which of the following is a good way of performing experiments in data science?
A Measure variability B Generalize to the problem C Have Replication D AII of the
21. Which of the following is commonly referred to-as 'data fishing'?
A Data bagging B Data booting C Data merging D None of the
22.Whichof the following data mining technique is used to uncover patterns in data?
. A Data bagging B Data booting C Data merging D Data Dredging
23. Which of the following operations are supported on Time Frames?
A idxmax B ixmax C ixmin D none of the mentioned
24. Numeric redugtion operation for timedelta64[ns] will return objects.
A Timeseries B Timeplus C Timedelta D None of the
25. Which of the following is used to generate an index with time delta?
A Timelndex B Timedeltalndex C Leadlndex D None of the

I s,,Ef,t',^-,
-d Prin-'ciPal, -'
rYAF ''tt}ff;I,:Alf"oYrT,
ACADEMIC YEAR: 2021-2022
REG. Zag t{ u t g aa +
DArE: Xg-"og , Lezi *
1.Which of the'following is the'most imp oft ant I an guagq. fo r D ata S cience?
AJava B Rubv @nJ D None of the menrioned
2. Which of the following approach should be used to ask Data Analysis question?
A Find only one solution forparticular problem ,i
@ina out the question which is to be unr*".edl
C Find out answer from.dataset without asking question
. D None of the mentioned
3. Which of the following is one of the key data science skills?
A Statistics B Machine Learning C Data Visualization@tt of the mentiong!, ,4

4. Which of the following is characteristic of Processed Data?

A Data is not ready for analysis @Att steps should be notefr
C Hard to use for data analvsis D None of the mentioned
5. The plot method on Series and DataFrame is just a simple wrapper around
A gplt.plotO €hlt.plot0*,r14 C plt.plotgraph0 D none of the mentioned
6. Which of thp^fo llowing value is provided by kind keyword for barplot?
@a. -/ B kde C hexbin D none of the mentioned
Which of the following random variable that take on only a countable number of
B Non Discrete C Continuous D All of the mentioned
8. Which ofthe following is also referred to as random variable?
A stochast C eliette D all of the mentioned
9. Which of the following is associated with a continuous random variable?
B pmv C pmf D all of the mentioned
10. Which of the following value is the most common measure of "statistical significance"?

S * BA CL DAllofthementioned
11. What is the purpose of multiple testing in statistical inference?
A Minimize errors B Minimize false positives
C Minimize false negatives
{)+tt of the mentioned .,,{
12. Which of the following focusbs on the disqbvery of (previously) unknown properties on the

mining B Big Data
"/1 CDatawrangling D Machine Learning


+ f, \,^
o PrlneiPat'
13. Which of the following uses relatively small amount of data to estimate about bigger
@inferent ial */ B Exploratory . C Causal D None of the
14. Which of the following analysis is usually modeled by set of equations?
A Predictive B Causal of the mentioned
.15. Which of the following is the top most thing in data science?
A answer @uestion C data D none of the mentioned
16. Which of the following approach should be used if you can't fix the variable?

@andomize ir"dfrn stratify it C generalize it D none of the mentioned

17. Which of the following is a good way of performing experiments in data science? A
A Measure variability B Generalize to the problem C Have Replication @an of"tfra
18. Which of the following data mining technique is used to uncover patterns in data?
ADatabagging B Databooting CDatamerging @.mfnredging
19. Which ofthe operations are suplrorted on Time Frames?
B ixmax C ixmin D none of the mentioned
20. Numeric reduction operation for will return objects.
A Timeseries B Timeplus D None of the

I ndo-American College,
Cheyyar - OA4 4OT
" ACADEMICYEAR:2021-2022 \%. %
REG. NO:2oFttuBeA? NAME: ItAd"*fi T"i L\

YEAR/SEM: fr 19 DArE:$' o9"* ,t

1. Which of the following is the most important language for Data Science?
\6avaK BRuby CR DNoneofthementioned
2. Which of the following approach should be used to ask Data Analysis question?
A Find only one solution for particular problem
gfinA out the question which is to be answeredl
C Find out answer from dataset without asking question
D None of the mentioned
3. Which of the following is one of the key data science skills?
A Statistics B Machine Learning C Data Visualization ry-A-tt of the mentioned
4' Which o.f the following is characteristic of Processed Data?
A Data is not ready for analysis tBatt steps should be noted ''
C Hard to use for data analysis D None of the mentioned
5. The plot method on Series and DataFrame is just a simple wrapper around
A gplt.plot0 Bfft.ptotg r/ C plt.plotgraph0 D none of the mentioned
6. Which of the following value is provided by kind keyword for barplot?
ffiar '/' B kde C hexbin D none of the mentioned
7. Which of the following random variable that take on only a countable number of
W6iscrete-/ B Non Discrete .' C Continuous D All of the mentioned
8. Which of the following is also referred to as random variable?
A stochast deatory / ' C eliette D all of the mentioned
9. Which of the following function is associated with a continuous random variable?
Wdt "/ B pmv C pmf D all of the mentioned
10. Which o.f the following value is the most common measure of "statistical significance"?

I BA cL of the mentioned I
11. What is the purpose of multiple testing in statistical inference?
A Minimize effors B Minimize false positives
C Minimize false negatives \P-All of the mentioned I
12. Which of the following focuses on the discovery of (previously) unknown properties on the

'A,-Dxa mining B Big Data ,1. C Data wrangling D Machine Learning

+ 7 Pri NC ipal,
I ndo-Ameri ca n Col le$e,
Cheyyar - 604 4Q7e/
13. Which of the following uses relatively small amount of data to estimate about bigger
rk[nferentialz' B Exploratory C Causal D None of the
14. Which of the following analysis is usually modeled by determini.stic set of equations?
A Predictive B Causal .' G'lvlechanistic f Xtt of the mentioned
15. Which of the foilowing is the top most important thing in data science?
A answer E{-uestionr- C data D none of the mentioned
16. Which of the following approach should be used if you can't fix the variable?
\ rflandomi ze it B non stratify it/ C generalize it D none of the mentioned
17. Which of the following is a good way of performing experiments in data science?
A Measure variability B Generalize to the problem C Have Replication Wll of the ,/
18. Which of the following data mining technique is used to uncover patterns in data?
A Data bagging B Data booting C Data merging WDataDred,ging .r/
19. Which of the following operations are supported on Time Frames?
%ritlxmax B ixmax C ixmin D none of the mentioned
20. Numeric reduction operation for timedelta64[ns] will return objects
A Timeseries B Timeplus ffiimedelta DNone of the .:,(


Pit: ?22008 l?t


\ {
ACADEMIC YE AR: 2021-2022
REG.No: ?-05 p?tS o oy NAME: Ke sA FA\'? p's
YEAR/sEM, E lg- DArE: 76-91nL! -
1. Which of the following is the most important lagg1age for Data Science?
A Java B Ruby 4p.,.2 D None of the mentioned
2. Which of the following approach should be used to ask Data Analysis question?
A Ei{ld only one solution for particular problem
;dr,"a out the question which is to be answdred
C Find out answer from dataset without asking question
D None of the mentioned
3 Which of the following is one of the key data science skills?
A Statistics B Machine Learning CDataVisualization dirrne mentioned
4. Which of the following is characteristic of Processed Data?
A Data is not ready for analysis
OWfrtt steps should be noted
C Hard to use for data analysisD None of the mentioned
5. The plot metho don Series and DataFrame is just a simple wrapper around
A gplt.plot0 ufn.ptotg " . C plt.plotgraph0 D none of the mentioned
6. Which of owing value is provided by kind keyword for barplot?
*{ar B kde C hexbin D none of the mentioned
7. Which of the variable that take on only a countable number of
g.discrete B Non Discrete C Continuous D All of the mentioned
8. Which of the following is also referred t-o-as random variable?
A stochast ,ffl"utrry ...,'r C eliette D all ofthe mentioned
9" Which of the following function is associated with a continuous random variable?
A pdf tUfp--, y- C pmf D all of the mentioned
10. Which ofthe value is the most common measure of "statistical signiflcance"?
Yf BA CL D All of the mentioned
I l. What is the purpose of multiple testing in statistical inference?
A Minimize errors B Minimize false positives
C Minimize false negatives Vdtt of the mentioned
12. Which of the following focuses on the discovery of (previously) unknown properties on the
fu{atamining B Big Data C Data wrangling D Machine Leaming

E t
t e ;
o,l -u'" PrinciPal"
"-ndpAnr *(ican Coile$e'

NI * btleYYar - 604 4o7 '

13. Which of the following uses relatively small amount of data to estimate about bigger
A inferential dxploratory * C Causal D None of the
14. Which of the following analysis is usually modeled by deterministic set of equations?
A Predictive B Causal edfrechanistic D All of the menfioned
15. Which of the following is the top thing in data science?
A answer C data D none of the mentioned
16. Which of the fol bpproach should be used if you can't fix the variable?
non stratify it C generulizeit D none of the mentioned
17. Which of the following is a good way of performing experiments in data
A Measure variability B Generalize to the problem C Have Replication of the
. mentioned
18. Which of the following data mining technique is used to uncover patterns in data?
A Data bagging B Data booting CDatamerging Vdata
19. Which of operations are supported on Time Frames?
B ixmax C ixmin D none of the mentioned
20. Numeric reduction operation for timedelta64[ns] will return objects.
A Timeseries B Timeplus vfiry*t*) D None of the

indo-American C9[13er
* "'tn*war - 604 4A7:

f.AA&* t$*i$tr "6",$r*rdx
fte,${3gn,sE* {*sre* r S**:ry$mrl I $ft & t 3 t*s? f rlG.*,efa

DEPARMENT Coq,put €t{ =gL\en,(l
DATE OF COMPLETION \o, 0ts.-zt z-
STUDENT NAME & REG NO s{rqunp, t{ r-rlSz-c' P) 9-t'e
1.H ow interesting was the courie is?
a. Strongly Agree WAgree c. Moderate d. Disagree e. Strongly Disagree
2. How this course usefut for you?

a. Strongly Agree b*Agree c. Moderate d. Disagree e. Strongly Disagree

3, How the technical support helped in this course?

g Strongly Agree b. Agree c. Mpderate d: Disagree e. Strongly Disagree

4. How the instructor teach the course?

"C. st ongly Agree b. Agree c. Moderate d. Disagree e. Strongly Disagree

5. How the content of the course helped in your knowledge upgradation?
a. Strongly Agree b( Agree c. Moderate d. Disagree e. Strongly Disagree I

6. How the instructor provide information about the course?

y'strongly Agree b. Agree c. Moderate d. Disagree e. strongly Disagree

7. How this course help in your career? :

a. Strongly Agree WA.gree c. Moderate d. Disagree e. Strongly Disagree

8. How do you rate this overat! programme?
a. Strongly Agree ,b. Agree d. Disagree e. Strongly Disagree
+ I
& E

6 !
+ n Col teE€'
lndo- Ameriea
eh eyyar - s a4,407
*rt#cp*mme ri ccx ffi ffi mfrfiffi#*
3{ <*.. 1 *-lt t ;*,,r+,,<3**a,* {' *r**,1}* g,*.
\€e r€-
**$i**d by' S-,'$,*\efi W'{}{tsde
#*e.*.ngrr*ww$ L$nd*,r Se'cat{3fl A qry si , 7 {ry} erf [j{3*:J{*L

CDro ?u'rER
YuvEeIS- k
STUDENT NAME & REG NO z.o-siaritot6
1. How interesting was the course is?

a. Strongly Agree b)Agree g. Moderate d. Disagree e. Strongly Disagree

2. How this course useful for you?

a. Strongly Agree p{lgr"" c. Moderate d. Disagree e. Strongly Disagree

3. How the technical support hetped in this course?

u, Strongly Agree V{grr" c. Moderate d. Disagree e. Strongly Disagree

4. How the instructor teach the course?

{strongly Agree b. Agree c. Moderate d..Disagree e. Strongly Disagree

5. How the content of the course helped in your knowledge upgradation?

./a. Strongly Agree b. Agree c. Moderate d. Disagree e. Strongly Disagree

5. H9w the instructor provide information about the course?

,/ a. Strongly Agree b. Agree c. Moderate d. Disagree e. Strongly Disagree
7. How s course help in your career?

Strongly Agree b. Agree c. Moderate d. Disagree e. Strongly Disagree

8. How do you rate this overa rogramme?

a. Strongly Agree ec. te d. Disagree e. Strongly Disagree


\ +
b f,)rAr4**,
n Collefe'
I ndo-America
- 6o4 4o7
}\ Lli "t-c;{ .r:)(}lFter (d:};i*$:(:
$}*ri1?ffi{T *$ l&.iG*w.tmd t* Tl$$Ftt-}"a.",1{*'E-r4-r'"',r-&${ vJSi*i$\rXffiSi'{-Y" r"d*lierre '
l{,**r,*drtxd irp S*}\, S !,tditt} {3r,t*dx
,*teg,r:{trslw*d, {jrrdet Ss*,{irpl'l 3 tf.} "B -* X {&,} sf .*{SC. Anf-

COURSE NAME WITH CODE .|,-:Sa, 9c rc-nto a g &\ J-rr

DEPARMENT CprqPuter SCtenle

STUDENT NAME & REG NO 5HN\, SH i4l 2€g\1 \)16 0b
1. How interesting was the course is?

'\-,/ Strongly Agree b. Agree c. Moderate d. Disagree e. Strongly Disagree

2. How this course useful for You?

@,strongly Agree b. Agree c. Moderate d. Disagree e. Strongly Disagree

3. How the technical support helped in this course?

a. Strongly Agree @Agree c. Moderate d. Disagree e. Strongly Disagree

4. How the instructor teach the course?

a. StronglyAgree@Agree c. Moderate d. Disagree e. StronglyDisagree

5. How the content of the course helped in your knowledge upgradation?

a. Strongly Agree @.gree c. Moderate d. Disagree e. Strongly Disagree

6. How the instructor provide information about the course?

Agree b. Agree c. Moderate d. Disagree e. Strongly Disagree
@ Strongly
7. How this course help in your career?

a. Strongly Agree @gree c. Moderate d. Disagree e. Strongly Disagree

8. How do you rate this overall programme?

a. Strongly Ayree@Agree c. Moderate d. Disagree e. Strongly Disagree

AME 6,Fefi"-
i r;;+Urp6l,
+ ,i icjo-Arrerican CollegG,
Cheyyar - 6O4,.'- '

Program Name: DATA SCIENCE (CSDS21)

You might also like