Data and Knowledge Management Competency

1
Data and Knowledge Management Competency

Mojdeh Amini
HCIN-548-02-SP21 - HCI Seminar- e-Portfolio
Healthcare Informatics- University of San Diego
Professor Dorothy O'Hagan
May 13, 2021
2
Introduction
The data and knowledge management competencies were essential and relevant to our healthcare
informatics program, particularly in the analytic track, because it included many opportunities to
gain knowledge, skills, and appropriate statistical tools and techniques for evaluation of the data
to answer the concerns and solve the problem with a more accurate outcome. For example,
medical/nonmedical terminologies ( see Appendix A) and statistical tools and techniques like
structured query language (SQL) for collecting data and statistical analysis and Python as a
programing language with code readability for data analytics, machine learning, and design to
web development and data visualization, I have selected the following competencies.
• Demonstrate proper techniques for gathering, formatting, and storing data to investigate a
given question or problem.
• Demonstrate skills in using data management software such as SQL and Microsoft Office
to analyze a given problem.
• Apply selected statistical methodologies to evaluate a problem.

3
Artifact 1
Demonstrate proper techniques for gathering, formatting, and storing data to investigate a
given question or problem.
Before selecting and collecting any data, my primary consideration is to ensure that any
information collected is consistent with freedom of information and privacy protection
legislation and complies Health Insurance Portability and Accountability Act of 1996 (HIPAA).
Also, to protect the credibility and reliability of data, information should be gathered using
accepted data collection techniques. Commonly the six steps as follows are used:
Step 1: Collecting data which is the most critical step of the knowledge management process,
Step 2: Organizing. As the data collected needs to be organized,
Step 3: Summarizing,
Step 4: Analyzing and interpreting,
Step 5: Synthesizing, and
Step 6: Decision-making for acting on the data.

4
Artifact 2
Demonstrate skills in using data management software such as SQL and Microsoft Office
to analyze a given problem.
SQL as a standard language allows storing, manipulating, and retrieving data in databases and
relational database management systems that contain one or more objects called tables. Some
standard relational database management systems that use SQL are Sybase, Microsoft SQL
Server, Access, Ingres. SQL statements are used to perform tasks such as update data on a
database or retrieve data from a database. However, most database systems use SQL. However,
the standard SQL commands such as Select, From, Where Insert, Update, Delete, Create, and
Drop can be used to complete almost everything that one needs to do with a database because
SQL is a relational database management system that can contain one or more objects called
tables. The select statement is to query the database and retrieve selected data that match the
specific criteria for inserting or adding a row of data into the tables. That can be accomplished by
carefully constructing a where clause.

5
Artifact 3
Apply selected statistical methodologies to evaluate a problem.
Blood pressure (BP) and diabetes activities via HbA1c are two examples for SQL Code to load
tables.
SQL Code to load tables: BP_Class_Activity_20191024.sql

6
SQL Code to load tables:
HbA1c_Class_Activity_20191024.sql
7
Appendix A
Machine Learning Terminology and References
Artificial Intelligence & Machine Learning
Artificial Intelligence: the ability of a machine to perform cognitive functions we
associate with human minds, such as perceiving, reasoning, learning, interacting with the
environment, problem-solving, and even exercising creativity.
Hyperlink for more info: https://builtin.com/artificial-intelligence
Machine Learning: detect patterns and learn how to make predictions and
recommendations by processing data and experiences rather than receiving explicit
programming instruction.
Hyperlink for more info: https://www.expert.ai/blog/machine-learning-definition/
Deep Learning: a type of machine learning that can process a broader range of data
resources, requires fewer data preprocessing by humans, and can often produce more
accurate results than traditional machine-learning approaches.
Hyperlink for more info: https://machinelearningmastery.com/what-is-deep-learning/
Descriptive Analysis: use data aggregation and data mining to provide insight into the
past.
Predictive Modeling: use statistical models and forecasting techniques to understand the
future.
Hyperlink for more info: https://www.microstrategy.cn/us/resources/introductory-
guides/predictive-modeling-the-only-guide-you-need
8
Prescriptive Modeling: use optimization and simulation algorithms to advise on possible
outcomes.
Hyperlink for more info: https://www.valamis.com/hub/prescriptive-analytics
Text Analytics: automated process of translating large volumes of unstructured text into
quantitative data to uncover insights, trends, and patterns.
Natural Language Processing (NLP): a field of Artificial Intelligence that gives the
machines the ability to read, understand and derive meaning from human languages.
Python: programming language more general approach to data science. The general-
purpose programming language is used to develop software on the web and in-app form.
Hyperlink for more info: https://www.pythonforbeginners.com/learn-python/what-is-
python
R: programming language mainly used for statistical Analysis - data manipulation,
calculation, and graphical display
Python Run-Time Environment: the software stack responsible is for installing your web
service's code and its dependencies and running your service.
Alternative definition: To get your machine to run python code, you need some way to
convert it into machine code (a low-level language comprised of binary digits - ones and
zeros). The programs, libraries, and configurations that allow you to do this are
collectively known as the "python runtime environment."
Source:
https://www.reddit.com/r/learnpython/comments/2pmqcj/can_someone_explain_what_wh
at_is_the_python/
9
Python Library: Reusable chunk of code is that can be included in your programs/
projects; a collection of core modules.
Why are libraries used in Python?
Python Libraries are a set of useful functions that eliminate the need for writing codes
from scratch.
Source/Read more - 34 Open-Source Python Libraries You Should Know About:
https://www.mygreatlearning.com/blog/open-source-python-libraries/
Python Notebook: interface to combine, compile and print output of software code.
Alternative definition: An open-source web application allows data scientists to create
and share documents that integrate live code, equations, computational output,
visualizations, and other multimedia resources, along with explanatory text in a single
document.
You can use Jupyter Notebooks for all sorts of data science tasks, including data cleaning
and transformation, numerical simulation, exploratory data analysis, data visualization,
statistical Modeling, machine learning, deep learning, and much more.
Source/Read more - Why You Should be Using Jupyter Notebooks:
https://medium.com/@ODSC/why-you-should-be-using-jupyter-notebooks-ea2e568c59f2
Google Colaboratory: tool to combine executable code and rich text in a single
document, along with images, HTML, LaTeX, and more
Alternative definition: Colab is a Python development environment that runs in the
browser using Google Cloud.

10
Colab notebooks are Jupyter notebooks that Google Colab hosts. Colab enables users to
collaborate and run code that exploits Google's cloud resources, i.e., GPUs, TPUs, and
saving documents to Google Drive.
Source/Read more - Introduction to Colab and Python:
https://colab.research.google.com/github/tensorflow/examples/blob/master/courses/udacit
y_intro_to_tensorflow_for_deep_learning/l01c01_introduction_to_colab_and_python.ipy
nb
References:
Brownlee, J. (2020, August 14). What is deep learning? Retrieved February 03, 2021,
from https://machinelearningmastery.com/what-is-deep-learning/.
Built-In. (n.d.). What is Artificial Intelligence? How does ai work? Built in. Retrieved
February 03, 2021, from https://builtin.com/artificial-intelligence.
Expert.ai Team. (2020, May 6). What is machine learning? A definition - expert system.
Retrieved February 03, 2021, from https://www.expert.ai/blog/machine-learning-
definition/.
MicroStrategy. (n.d.). Predictive Modeling: The only guide you need. Retrieved February
03, 2021, from https://www.microstrategy.cn/us/resources/introductory-guides/predictive-
modeling-the-only-guide-you-need.
Panesar. (2019). Machine Learning and AI for Healthcare. Après.
VALAMIS. (n.d.). What are Prescriptive Analytics? How does it work? Examples &
benefits. Retrieved February 03, 2021, from https://www.valamis.com/hub/prescriptive-
analytics
Healthcare Data for Machine Learning

11
Clinical data sets: they are a group of information for a specific disease, intervention,
monitoring activity to maintain statistics, disease management, and clinical governance
(NIH, 2021).
Clinical value: Improving care, efficiency, and patient satisfaction (Becker's Hospital
Review, n.d.)
International Classification of Diseases (ICD): provides a method of classifying injuries,
diseases, and causes of death (NIH, 2018).
Systematized Nomenclature of Medicine (SNOMED): provides a standardized way to
represent clinical phrases recorded by the clinician and allows for the automatic
interpretation of these clinical phrases (SNOMED, n.d.)
Logical Observation Identifiers Names and Codes (LOINC): allows for the
aggregation and exchange of clinical results for care delivery, research, and outcomes
management by providing a set of standardized codes and structured names to
unambiguously identify things you can observe or measure (LOINC, n.d.).
RxNorm: Is a standardized naming system for both branded and generic drugs and a tool
for supporting semantic interoperation between pharmacy knowledge base systems and
drug terminologies (NIH, 2021).
National Drug Code (NDC): A universal product identifier for human drugs in the
United States is a unique 10-digit or 11-digit, and 3-segment number (Anderson, 2020).
Current Procedural Terminology (CPT): A medical code set used to report surgical,
diagnostic, and medical services and procedures to entities such as physicians, health
insurance companies, and accreditation organizations (Lee, 2015). Moreover, these CPT
12
codes are used and ICD-9-CM or ICD-10-CM numerical diagnostic coding during the
electronic medical record billing process (Lee, 2015).
Web and social media data: Clicks, history, health forums (Panesar, 2021).
Machine-to-machine data: It is sensors, wearables (Panesar, 2021).
Big transaction data: It iHealth claim data, billing data (Panesar, 2021).
Biometric data: It is Fingerprints, genetics, biomarkers driven from wearables (Panesar,
2021).
Human-generated data: Email, paper documents, electronic medical records (Panesar,
2021)\
Big Data 4 v's: Volume, Variety, Velocity, Veracity (Panesar, 2021).
Volume: Size of generated and stored data (Panesar, 2021).
Variety: Different types of data (Panesar, 2021).
Velocity: Speed in which Data is generated (Panesar, 2021).
Veracity: Data accuracy (Panesar, 2021).
Clinical data processes: the process of collection, cleaning, and management of subject
data in compliance with regulatory standards (Krishnankutty et al. l, 2012)
Diagnosis: investigation or Analysis of the cause or nature of a condition, situation, or
problem (Merriam-Webster, n.d.)
Lab results: are often shown as a set of numbers which are known as a reference range or
normal values for a sample test like blood, urine, and body fluid or tissue.
Medications: medicinal substance (Merriam-Webster, n.d.)
Procedures: a particular way of accomplishing something or of acting (Merriam-
Webster, n.d.)
13
Outliers: a statistical observation that is markedly different in value from the others of the
sample (Merriam-Webster, n.d.)
Missing Values: data value that is not stored for a variable in the observation of interest.
Code Libraries: a collection of codes that are available for public use.
Public data sources: free available datasets
Pandas: An open-source, BSD-licensed library providing high-performance, easy-to-use
data structures and data analysis tools for the Python programming language.
Matplotlib: a plotting library for the Python programming language and its numerical
mathematics extension NumPy
Data frame: a table or a two-dimensional array-like structure in which each column
contains values of one variable and each row contains one set of values from each
column.
Data Cleaning: the process of detecting and correcting corrupt or inaccurate records from
a record set, table, or database and refers to identifying incomplete, incorrect, faulty, or
irrelevant parts of the data and then replacing, modifying, or deleting the dirty or coarse
data.
Null: data value does not exist in the database
NaN: Not a Number
Kaggle: an online community of data scientists and machine learning practitioners
Imputation: the process of replacing missing data with substituted values
References:
Anderson, L. (2020). National drug Codes explained: What you need to know—Feb 2021,
from https://www.drugs.com/ndc.html.
14
Becker's Healthcare Review. (n.d.). Creating clinical Value: 4 steps to drive change and
improve care. Retrieved February 03, 2021, from
https://go.beckershospitalreview.com/creating-clinical-value-4-steps-to-drive-change-and-
improve-care\
Lee, K. (2015, June 22). What is Current PROCEDURAL TERMINOLOGY (CPT) code?
- definition from whatis.com. Retrieved February 03, 2021, from
https://searchhealthit.techtarget.com/definition/Current-Procedural-Terminology-
CPT#:~:text=Current%20Procedural%20Terminology%20(CPT)%20is,insurance%20com
panies%20and%20accreditation%20organizations.&text=CPT%20is%20a%20registered
%20trademark%20of%20the%20American%20Medical%20Association.
Logical Observation Identifiers Names and Codes (LOINC). (n.d.). About LOINC
Retrieved February 03, 2021, from
https://loinc.org/about/#:~:text=LOINC%20enables%20the%20exchange%20and,franca%
20for%20interoperable%20data%20exchange.
National Cancer Institute (NIH). (2018, December 3). What is the ICD? Retrieved
February 03, 2021, from https://training.seer.cancer.gov/icd10cm/intro.html.
National Library of Medicine (NIH). (2021). RxNorm overview. Retrieved February 03,
2021, from https://www.nlm.nih.gov/research/umls/rxnorm/overview.html.
Panesar. (2019). Machine Learning and AI for Healthcare. Apress.
Panesar, A. (2021). Machine Learning and AI for Healthcare: Big Data for Improved
Health Outcomes (2nd ed.). Apress. Doi: https://doi.org/10.1007/978-1-4842-6537-6.
Systematized Nomenclature of Medicine (SNOMED). (n.d.). 5-Step briefing. Retrieved
February 03, 2021, from https://www.snomed.org/snomed-ct/five-step-briefing.

15
Krishnankutty, B., Bellary, S., Kumar, N. B., & Moodahadu, L. S. (2012). Data
management in clinical research: An overview. Indian journal of pharmacology, 44(2),
168–172. https://doi.org/10.4103/0253-7613.93842
Merriam-Webster. (n.d.). Diagnosis. In Merriam-Webster.com dictionary. Retrieved
February 14, 2021, from https://www.merriam-webster.com/dictionary/diagnosis
Fundamentals of Machine Learning Algorithms
Ski-kit learn is a Python machine learning library and provides a range of supervised and
unsupervised learning algorithms by a consistent interface in Python (Brownlee, 2020).
Encode: means converting categorical data such as ordinal and nominal data into a
readable form to the machine.
Target: Output Variables (https://machinelearningmastery.com/how-to-transform-target-
variables-for-regression-with-scikit-learn/)
Feature: Input Variables to a machine learning model. (Doi: 10.1001/jama.2019.16489)
They were scaling the Data (aka normalizing): A method used to standardize the range
of features of data. Data is transformed so that parts are within a specific field, e.g.(0,1),
where x's is the normalized value. (https://kharshit.github.io/blog/2018/03/23/scaling-vs-
normalization)
Train_test_split: A procedure of estimating the performance of machine learning
algorithms when they are used to make predictions on data not used to train the model.
(https://machinelearningmastery.com/train-test-split-for-evaluating-machine-learning-
algorithms/)
Linear Regression: A continuous statistical techniques to understand the relationship
between an input/independent variable and an output/dependent variable.

16
Correlation coefficient : Statistical measure of the strength of the relationship between
the relative movements of two variables (https://www.statisticshowto.com/probability-
and-statistics/correlation-coefficient-formula/)
R2 (R-Squared): Statistical measure that represents the proportion of the variance for a
dependent variable that is explained by an independent variable or variables in a
regression model. (https://www.investopedia.com/terms/r/r-squared.asp)
Linear Equation: An equation that makes a straight line when it is graphed and often
written in the form y = mx + b (MathisFun, n.d.).
References
Brownlee, J. (2020). A gentle introduction to scikit-learn.
https://machinelearningmastery.com/a-gentle-introduction-to-scikit-learn-a-python-
machine-learning-library/
Brownlee, J. (2020). How to transform target variables for regression in Python.
https://machinelearningmastery.com/how-to-transform-target-variables-for-regression-
with-scikit-learn/
Brownlee, J. (2020). Train-Test Split for Evaluating Machine Learning Algorithms.
https://machinelearningmastery.com/train-test-split-for-evaluating-machine-learning-
algorithms/
Chen, P. C., Krause, J., Liu, Y., & Peng, L. (2019). How to read articles that use machine
learning users' guides to the medical literature. 10.1001/jama.2019.16489
Fernando, J. (2020). R-squared Definition. https://www.investopedia.com/terms/r/r-
squared.asp
17
Glen, S. (2021). Correlation Coefficient: Simple Definition, Formula, Easy Steps.
https://www.statisticshowto.com/probability-and-statistics/correlation-coefficient-
formula/
Kumar, H. (2018). Scaling vs. Normalization.
https://kharshit.github.io/blog/2018/03/23/scaling-vs-normalization
MathisFun. Linear Equation. Math is Fun. https://www.mathsisfun.com/definitions/linear-
equation.html.
Supervised Learning Using Classification Algorithms
Logistic Regression: A type of regression analysis to conduct when the dependent
variable is dichotomous (StatisticsSolutions, n.d.).
Confusion matrix: A table that is used to describe the performance of a classification
model on a set of data where the true values are known (data school, 2014).
Prevalence: The proportion of a population who have a specific character during a given
time (NIH, 2017).
Accuracy: is one metric for evaluating classification models by calculating the correct
predictions as a ratio of all projections.
Accuracy= Number of valid predictions/Number of Total predictions.
K-Nearest Neighbors: (KNN) is a simple, easy-to-implement supervised ML that makes
predictions using the training dataset directly to solve problems in both classification and
regression.
Model Tuning: is the process of maximizing a model's performance without overfitting
or creating too high of a variance. that enables the algorithm to perform the "best," based
on what is specified as "best" (Panesar, 2019).

18
Naive Bayes- is a probabilistic classification method based on Bayes' theorem where a
prediction can be made based on prior knowledge and current evidence (Saritas & Yasar,
2019).
Decision Tree: A tree-like graph consists of nodes representing a test on an attribute and
branches signifying the outcome of the test and leaf nodes meaning a label (Rai, Devi &
Guleria, 2016).
Support Vector Machines: A computer algorithm that uses the example to assign labels
to objects (Noble, 2006).
Random Forest: A machine learning algorithm that fits multiple decision trees to input
data using a random subset of the input variables for each tree constructed (Mascaro et al.,
2014).
Coding Schemes: A set of codes, defined by the words and phrases researchers assign to
categorize a segment of the data by topic; researchers consider what questions are trying
to be answered and related issues to those questions (Urban Institute, 2015).
References
Brownlee, J. (2020). Metrics to Evaluate Machine Learning Algorithms in Python.
Machine Learning Mastery. https://machinelearningmastery.com/metrics-evaluate-
machine-learning-algorithms-python/.
data school. (2014, March 25). A simple guide to confusion matrix terminology. Data
School. https://www.dataschool.io/simple-guide-to-confusion-matrix-terminology/.
Harrison, O. (2019). Machine Learning Basics with the K-Nearest Neighbors Algorithm.
https://towardsdatascience.com/machine-learning-basics-with-the-k-nearest-neighbors-
algorithm-6a6e71d01761.
19
Mascaro, J., Asner, G. P., Knapp, D. E., Kennedy-Bowdoin, T., Martin, R. E., Anderson,
C., ... & Chadwick, K. D. (2014). A tale of two "forests": Random Forest machine
learning aids tropical forest carbon mapping. PloS one, 9(1), e85993.
National Institute of Mental Health (NIH). (2017, November). What is Prevalence?
National Institute of Mental Health. https://www.nimh.nih.gov/health/statistics/what-is-
prevalence.shtml.
Noble, W. S. (2006). What is a support vector machine? Nature Biotechnology, 24(12),
1565-1567.
Rai, K., Devi, M. S., & Guleria, A. (2016). Decision tree-based algorithm for intrusion
detection. International Journal of Advanced Networking and Applications, 7(4), 2828.
Saritas, M. M., & Yasar, A. (2019). Performance analysis of ANN and Naive Bayes
classification algorithm for data classification. International Journal of Intelligent Systems
and Applications in Engineering, 7(2), 88-91.
statistics solutions. (n.d.). What is Logistic Regression? Statistics Solutions.
https://www.statisticssolutions.com/what-is-logistic-regression/.
Panesar, A. (2019). Machine learning and AI for healthcare. Coventry, UK: Apress.
eBook ISBN 978-1-4842-3799-1; Softcover ISBN 978-1-4842-3798-4
Urban Institute. (2015). Qualitative Data Analysis. Urban Institute: Data & Methods.
https://www.urban.org/research/data-methods/data-analysis/qualitative-data-
analysis#:~:text=A%20coding%20scheme%20is%20a,related%20topics%20to%20those
%20questions.
Unsupervised Clustering Algorithms

20
MinMaxScaler: transforms features by scaling each feature to a given range (sci-kit
learn, n.d.).
Pipeline: is a sum of tools and processes for performing data integration by capturing
datasets from multiple sources (AltexSoft, 2019).
Principle Component Analysis (PCA): is a technique for reducing the dimensionality of
datasets, increasing interpretability, and minimizing information loss (Jolliffe & Cadima,
2016).
K-means refers to averaging of the data for finding the centroid. (Garbade, 2018).
Centroids: Actual or predicted center of a given cluster (Garbade, 2018)
Silhouette score: A score indicating separation distance between resulting clusters. A
score of 0 indicates proximity while -1, or +1, indicates farther away. (Scikit Learn, 2020)
Sigmoid function: is a mathematical function with a characteristic S-shaped curve—
several standard sigmoid functions, such as the logistic function, the hyperbolic tangent,
and the arctangent (Wood, 2020).
References
AltexSoft. (2019). What is Data Engineering: Explaining the Data Pipeline, Data
Warehouse, and Data Engineer Role. AltexSoft.
https://www.altexsoft.com/blog/datascience/what-is-data-engineering-explaining-data-
pipeline-data-warehouse-and-data-engineer-role/.
Garbade, D. M. J. (2018, September 12). Understanding K-means Clustering in Machine
Learning. Medium. https://towardsdatascience.com/understanding-k-means-clustering-in-
machine-learning-6a6e67336aa1.
21
Jolliffe, I. T., & Cadima, J. (2016). Principal component analysis: a review and recent
developments. Philosophical Transactions of the Royal Society A: Mathematical, Physical
and Engineering Sciences, 374(2065), 20150202.
scikit learn. sklearn.preprocessing.MinMaxScaler. scikit. https://scikit-
learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html.
Wood, T. (2020). Sigmoid Function. DeepAI. https://deepai.org/machine-learning-
glossary-and-terms/sigmoid-functio
Garbade, M. (2018). Understanding K-means Clustering in Machine Learning.
https://towardsdatascience.com/understanding-k-means-clustering-in-machine-learning-
6a6e67336aa1
Scikit Learn. (2020). Selecting the number of clusters with silhouette analysis on KMeans
clustering. https://scikit-
learn.org/stable/auto_examples/cluster/plot_kmeans_silhouette_analysis.html
Ethics of Machine Learning
Definitions: A set of organized, reusable code called upon to perform a required coding
action (Python, 2021).
Structure: the aggregate of elements of an entity in their relationships to each other
(Merriam-Webster, 2021).=
Formatting: A process in Python where a user inserts a specified value inside the desired
placeholder, e.g., a string placeholder – string. Format(value1, value2, value3)
(W3Schools, 2021).
Writing: Process of scripting computer code (Python, 2021).

22
Attributions (proper APA citations): Actions related to qualities or features as
characteristics of or possessed by entities, people, or things (Pratt & Last, 2014).
Hyperlinks: a tag in a web page that can link one web page to another page or location in
the same web page (Pratt & Last, 2014).
References
Marriam-Webster. (2021). https://www.merriam-webster.com/dictionary/structure.
Merriam-Webster. https://www.merriam-webster.com/dictionary/structure.
Pratt, P. J., & Last, M. Z. (2014). Concepts of database management. Cengage Learning.
Python. (2021). Classes. https://docs.python.org/3/tutorial/classes.html
Python. (2021). Input and Output. Reading and Writing Files.
https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files
W3Schools. (2021). Python String format() Method.
https://www.w3schools.com/python/ref_string_format.asp

Data and Knowledge Management Competency

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data and Knowledge Management Competency

Uploaded by

Copyright:

Available Formats

1

Data and Knowledge Management Competency

given question or problem.

to analyze a given problem.

• Apply selected statistical methodologies to evaluate a problem.

given question or problem.

information collected is consistent with freedom of information and privacy protection

Step 2: Organizing. As the data collected needs to be organized,

Step 4: Analyzing and interpreting,

Step 5: Synthesizing, and

Step 6: Decision-making for acting on the data.

to analyze a given problem.

carefully constructing a where clause.

Apply selected statistical methodologies to evaluate a problem.

SQL Code to load tables: BP_Class_Activity_20191024.sql

SQL Code to load tables:

Machine Learning Terminology and References

Artificial Intelligence & Machine Learning

Artificial Intelligence: the ability of a machine to perform cognitive functions we

environment, problem-solving, and even exercising creativity.

Hyperlink for more info: https://builtin.com/artificial-intelligence

recommendations by processing data and experiences rather than receiving explicit

Hyperlink for more info: https://www.expert.ai/blog/machine-learning-definition/

accurate results than traditional machine-learning approaches.

Hyperlink for more info: https://machinelearningmastery.com/what-is-deep-learning/

Hyperlink for more info: https://www.microstrategy.cn/us/resources/introductory-

Prescriptive Modeling: use optimization and simulation algorithms to advise on possible

Hyperlink for more info: https://www.valamis.com/hub/prescriptive-analytics

quantitative data to uncover insights, trends, and patterns.

Hyperlink for more info: https://www.pythonforbeginners.com/learn-python/what-is-

R: programming language mainly used for statistical Analysis - data manipulation,

calculation, and graphical display

service's code and its dependencies and running your service.

collectively known as the "python runtime environment."

projects; a collection of core modules.

Why are libraries used in Python?

Source/Read more - 34 Open-Source Python Libraries You Should Know About:

Alternative definition: An open-source web application allows data scientists to create

and transformation, numerical simulation, exploratory data analysis, data visualization,

statistical Modeling, machine learning, deep learning, and much more.

Source/Read more - Why You Should be Using Jupyter Notebooks:

document, along with images, HTML, LaTeX, and more

Alternative definition: Colab is a Python development environment that runs in the

browser using Google Cloud.

saving documents to Google Drive.

Source/Read more - Introduction to Colab and Python:

February 03, 2021, from https://builtin.com/artificial-intelligence.

Retrieved February 03, 2021, from https://www.expert.ai/blog/machine-learning-

03, 2021, from https://www.microstrategy.cn/us/resources/introductory-guides/predictive-

Panesar. (2019). Machine Learning and AI for Healthcare. Après.

benefits. Retrieved February 03, 2021, from https://www.valamis.com/hub/prescriptive-

Healthcare Data for Machine Learning

monitoring activity to maintain statistics, disease management, and clinical governance

International Classification of Diseases (ICD): provides a method of classifying injuries,

diseases, and causes of death (NIH, 2018).

Systematized Nomenclature of Medicine (SNOMED): provides a standardized way to

interpretation of these clinical phrases (SNOMED, n.d.)

management by providing a set of standardized codes and structured names to

unambiguously identify things you can observe or measure (LOINC, n.d.).

drug terminologies (NIH, 2021).

electronic medical record billing process (Lee, 2015).

Machine-to-machine data: It is sensors, wearables (Panesar, 2021).

Biometric data: It is Fingerprints, genetics, biomarkers driven from wearables (Panesar,