Professional Documents
Culture Documents
Data and Knowledge Management Competency
Data and Knowledge Management Competency
Introduction
The data and knowledge management competencies were essential and relevant to our healthcare
informatics program, particularly in the analytic track, because it included many opportunities to
gain knowledge, skills, and appropriate statistical tools and techniques for evaluation of the data
to answer the concerns and solve the problem with a more accurate outcome. For example,
medical/nonmedical terminologies ( see Appendix A) and statistical tools and techniques like
structured query language (SQL) for collecting data and statistical analysis and Python as a
programing language with code readability for data analytics, machine learning, and design to
web development and data visualization, I have selected the following competencies.
• Demonstrate proper techniques for gathering, formatting, and storing data to investigate a
• Demonstrate skills in using data management software such as SQL and Microsoft Office
Artifact 1
Demonstrate proper techniques for gathering, formatting, and storing data to investigate a
Before selecting and collecting any data, my primary consideration is to ensure that any
legislation and complies Health Insurance Portability and Accountability Act of 1996 (HIPAA).
Also, to protect the credibility and reliability of data, information should be gathered using
accepted data collection techniques. Commonly the six steps as follows are used:
Step 1: Collecting data which is the most critical step of the knowledge management process,
Step 3: Summarizing,
Artifact 2
Demonstrate skills in using data management software such as SQL and Microsoft Office
SQL as a standard language allows storing, manipulating, and retrieving data in databases and
relational database management systems that contain one or more objects called tables. Some
standard relational database management systems that use SQL are Sybase, Microsoft SQL
Server, Access, Ingres. SQL statements are used to perform tasks such as update data on a
database or retrieve data from a database. However, most database systems use SQL. However,
the standard SQL commands such as Select, From, Where Insert, Update, Delete, Create, and
Drop can be used to complete almost everything that one needs to do with a database because
SQL is a relational database management system that can contain one or more objects called
tables. The select statement is to query the database and retrieve selected data that match the
specific criteria for inserting or adding a row of data into the tables. That can be accomplished by
Artifact 3
Blood pressure (BP) and diabetes activities via HbA1c are two examples for SQL Code to load
tables.
HbA1c_Class_Activity_20191024.sql
7
Appendix A
associate with human minds, such as perceiving, reasoning, learning, interacting with the
Machine Learning: detect patterns and learn how to make predictions and
programming instruction.
Deep Learning: a type of machine learning that can process a broader range of data
resources, requires fewer data preprocessing by humans, and can often produce more
Descriptive Analysis: use data aggregation and data mining to provide insight into the
past.
Predictive Modeling: use statistical models and forecasting techniques to understand the
future.
guides/predictive-modeling-the-only-guide-you-need
8
outcomes.
Text Analytics: automated process of translating large volumes of unstructured text into
Natural Language Processing (NLP): a field of Artificial Intelligence that gives the
machines the ability to read, understand and derive meaning from human languages.
Python: programming language more general approach to data science. The general-
purpose programming language is used to develop software on the web and in-app form.
python
Python Run-Time Environment: the software stack responsible is for installing your web
Alternative definition: To get your machine to run python code, you need some way to
convert it into machine code (a low-level language comprised of binary digits - ones and
zeros). The programs, libraries, and configurations that allow you to do this are
Source:
https://www.reddit.com/r/learnpython/comments/2pmqcj/can_someone_explain_what_wh
at_is_the_python/
9
Python Library: Reusable chunk of code is that can be included in your programs/
Python Libraries are a set of useful functions that eliminate the need for writing codes
from scratch.
https://www.mygreatlearning.com/blog/open-source-python-libraries/
Python Notebook: interface to combine, compile and print output of software code.
and share documents that integrate live code, equations, computational output,
visualizations, and other multimedia resources, along with explanatory text in a single
document.
You can use Jupyter Notebooks for all sorts of data science tasks, including data cleaning
https://medium.com/@ODSC/why-you-should-be-using-jupyter-notebooks-ea2e568c59f2
Google Colaboratory: tool to combine executable code and rich text in a single
Colab notebooks are Jupyter notebooks that Google Colab hosts. Colab enables users to
collaborate and run code that exploits Google's cloud resources, i.e., GPUs, TPUs, and
https://colab.research.google.com/github/tensorflow/examples/blob/master/courses/udacit
y_intro_to_tensorflow_for_deep_learning/l01c01_introduction_to_colab_and_python.ipy
nb
References:
Brownlee, J. (2020, August 14). What is deep learning? Retrieved February 03, 2021,
from https://machinelearningmastery.com/what-is-deep-learning/.
Built-In. (n.d.). What is Artificial Intelligence? How does ai work? Built in. Retrieved
Expert.ai Team. (2020, May 6). What is machine learning? A definition - expert system.
definition/.
MicroStrategy. (n.d.). Predictive Modeling: The only guide you need. Retrieved February
modeling-the-only-guide-you-need.
VALAMIS. (n.d.). What are Prescriptive Analytics? How does it work? Examples &
analytics
Clinical data sets: they are a group of information for a specific disease, intervention,
(NIH, 2021).
Clinical value: Improving care, efficiency, and patient satisfaction (Becker's Hospital
Review, n.d.)
represent clinical phrases recorded by the clinician and allows for the automatic
Logical Observation Identifiers Names and Codes (LOINC): allows for the
aggregation and exchange of clinical results for care delivery, research, and outcomes
RxNorm: Is a standardized naming system for both branded and generic drugs and a tool
for supporting semantic interoperation between pharmacy knowledge base systems and
National Drug Code (NDC): A universal product identifier for human drugs in the
United States is a unique 10-digit or 11-digit, and 3-segment number (Anderson, 2020).
Current Procedural Terminology (CPT): A medical code set used to report surgical,
diagnostic, and medical services and procedures to entities such as physicians, health
insurance companies, and accreditation organizations (Lee, 2015). Moreover, these CPT
12
codes are used and ICD-9-CM or ICD-10-CM numerical diagnostic coding during the
Web and social media data: Clicks, history, health forums (Panesar, 2021).
Big transaction data: It iHealth claim data, billing data (Panesar, 2021).
2021).
2021)\
Clinical data processes: the process of collection, cleaning, and management of subject
Lab results: are often shown as a set of numbers which are known as a reference range or
normal values for a sample test like blood, urine, and body fluid or tissue.
Webster, n.d.)
13
Outliers: a statistical observation that is markedly different in value from the others of the
Missing Values: data value that is not stored for a variable in the observation of interest.
Code Libraries: a collection of codes that are available for public use.
data structures and data analysis tools for the Python programming language.
Matplotlib: a plotting library for the Python programming language and its numerical
contains values of one variable and each row contains one set of values from each
column.
Data Cleaning: the process of detecting and correcting corrupt or inaccurate records from
a record set, table, or database and refers to identifying incomplete, incorrect, faulty, or
irrelevant parts of the data and then replacing, modifying, or deleting the dirty or coarse
data.
References:
Anderson, L. (2020). National drug Codes explained: What you need to know—Feb 2021,
from https://www.drugs.com/ndc.html.
14
Becker's Healthcare Review. (n.d.). Creating clinical Value: 4 steps to drive change and
https://go.beckershospitalreview.com/creating-clinical-value-4-steps-to-drive-change-and-
improve-care\
Lee, K. (2015, June 22). What is Current PROCEDURAL TERMINOLOGY (CPT) code?
https://searchhealthit.techtarget.com/definition/Current-Procedural-Terminology-
CPT#:~:text=Current%20Procedural%20Terminology%20(CPT)%20is,insurance%20com
panies%20and%20accreditation%20organizations.&text=CPT%20is%20a%20registered
%20trademark%20of%20the%20American%20Medical%20Association.
Logical Observation Identifiers Names and Codes (LOINC). (n.d.). About LOINC
https://loinc.org/about/#:~:text=LOINC%20enables%20the%20exchange%20and,franca%
20for%20interoperable%20data%20exchange.
National Cancer Institute (NIH). (2018, December 3). What is the ICD? Retrieved
National Library of Medicine (NIH). (2021). RxNorm overview. Retrieved February 03,
Panesar, A. (2021). Machine Learning and AI for Healthcare: Big Data for Improved
Krishnankutty, B., Bellary, S., Kumar, N. B., & Moodahadu, L. S. (2012). Data
168–172. https://doi.org/10.4103/0253-7613.93842
Ski-kit learn is a Python machine learning library and provides a range of supervised and
Encode: means converting categorical data such as ordinal and nominal data into a
variables-for-regression-with-scikit-learn/)
They were scaling the Data (aka normalizing): A method used to standardize the range
of features of data. Data is transformed so that parts are within a specific field, e.g.(0,1),
normalization)
algorithms when they are used to make predictions on data not used to train the model.
(https://machinelearningmastery.com/train-test-split-for-evaluating-machine-learning-
algorithms/)
and-statistics/correlation-coefficient-formula/)
R2 (R-Squared): Statistical measure that represents the proportion of the variance for a
Linear Equation: An equation that makes a straight line when it is graphed and often
References
https://machinelearningmastery.com/a-gentle-introduction-to-scikit-learn-a-python-
machine-learning-library/
https://machinelearningmastery.com/how-to-transform-target-variables-for-regression-
with-scikit-learn/
https://machinelearningmastery.com/train-test-split-for-evaluating-machine-learning-
algorithms/
Chen, P. C., Krause, J., Liu, Y., & Peng, L. (2019). How to read articles that use machine
squared.asp
17
https://www.statisticshowto.com/probability-and-statistics/correlation-coefficient-
formula/
https://kharshit.github.io/blog/2018/03/23/scaling-vs-normalization
equation.html.
model on a set of data where the true values are known (data school, 2014).
Prevalence: The proportion of a population who have a specific character during a given
Accuracy: is one metric for evaluating classification models by calculating the correct
predictions using the training dataset directly to solve problems in both classification and
regression.
or creating too high of a variance. that enables the algorithm to perform the "best," based
prediction can be made based on prior knowledge and current evidence (Saritas & Yasar,
2019).
Decision Tree: A tree-like graph consists of nodes representing a test on an attribute and
branches signifying the outcome of the test and leaf nodes meaning a label (Rai, Devi &
Guleria, 2016).
Support Vector Machines: A computer algorithm that uses the example to assign labels
Random Forest: A machine learning algorithm that fits multiple decision trees to input
data using a random subset of the input variables for each tree constructed (Mascaro et al.,
2014).
Coding Schemes: A set of codes, defined by the words and phrases researchers assign to
categorize a segment of the data by topic; researchers consider what questions are trying
References
machine-learning-algorithms-python/.
data school. (2014, March 25). A simple guide to confusion matrix terminology. Data
School. https://www.dataschool.io/simple-guide-to-confusion-matrix-terminology/.
Harrison, O. (2019). Machine Learning Basics with the K-Nearest Neighbors Algorithm.
https://towardsdatascience.com/machine-learning-basics-with-the-k-nearest-neighbors-
algorithm-6a6e71d01761.
19
Mascaro, J., Asner, G. P., Knapp, D. E., Kennedy-Bowdoin, T., Martin, R. E., Anderson,
C., ... & Chadwick, K. D. (2014). A tale of two "forests": Random Forest machine
learning aids tropical forest carbon mapping. PloS one, 9(1), e85993.
prevalence.shtml.
1565-1567.
Rai, K., Devi, M. S., & Guleria, A. (2016). Decision tree-based algorithm for intrusion
Saritas, M. M., & Yasar, A. (2019). Performance analysis of ANN and Naive Bayes
https://www.statisticssolutions.com/what-is-logistic-regression/.
Panesar, A. (2019). Machine learning and AI for healthcare. Coventry, UK: Apress.
Urban Institute. (2015). Qualitative Data Analysis. Urban Institute: Data & Methods.
https://www.urban.org/research/data-methods/data-analysis/qualitative-data-
analysis#:~:text=A%20coding%20scheme%20is%20a,related%20topics%20to%20those
%20questions.
learn, n.d.).
Pipeline: is a sum of tools and processes for performing data integration by capturing
datasets, increasing interpretability, and minimizing information loss (Jolliffe & Cadima,
2016).
K-means refers to averaging of the data for finding the centroid. (Garbade, 2018).
score of 0 indicates proximity while -1, or +1, indicates farther away. (Scikit Learn, 2020)
several standard sigmoid functions, such as the logistic function, the hyperbolic tangent,
References
AltexSoft. (2019). What is Data Engineering: Explaining the Data Pipeline, Data
https://www.altexsoft.com/blog/datascience/what-is-data-engineering-explaining-data-
pipeline-data-warehouse-and-data-engineer-role/.
machine-learning-6a6e67336aa1.
21
Jolliffe, I. T., & Cadima, J. (2016). Principal component analysis: a review and recent
learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html.
glossary-and-terms/sigmoid-functio
https://towardsdatascience.com/understanding-k-means-clustering-in-machine-learning-
6a6e67336aa1
Scikit Learn. (2020). Selecting the number of clusters with silhouette analysis on KMeans
clustering. https://scikit-
learn.org/stable/auto_examples/cluster/plot_kmeans_silhouette_analysis.html
Definitions: A set of organized, reusable code called upon to perform a required coding
(Merriam-Webster, 2021).=
Formatting: A process in Python where a user inserts a specified value inside the desired
(W3Schools, 2021).
Hyperlinks: a tag in a web page that can link one web page to another page or location in
References
Merriam-Webster. https://www.merriam-webster.com/dictionary/structure.
Pratt, P. J., & Last, M. Z. (2014). Concepts of database management. Cengage Learning.
https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files
https://www.w3schools.com/python/ref_string_format.asp