An Introduction To ML Lifecycle Ontology and Its Applications

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

An Introduction to ML Lifecycle Ontology and its

Applications

Milos Drobnjakovic1[], Perawit Charoenwut1[0009-0006-5078-5469], Ana Nikolov1[], Hakju


Oh1[], and Boonserm Kulvatunyou1[]
1 NIST, 100 Bureau Dr., Gaithersburg, MD, 20899, USA
perawit.charoenwut@nist.gov

Abstract. Machine Learning (ML) adoption is on the rapid rise with nearly 40%
compound annual growth rate over the next decade. In other words, companies
will be flooded with ML models developed with different datasets and software.
The ability to have information at fingertips to what datasets were used, how
these ML models were developed, what they were used for, what their
performances and uncertainties are, and what their internal structure look like can
have several benefits. These pieces of ML metadata are what we collectively call
ML lifecycle data. In this paper, we explain our current research into developing
an ML Lifecycle Ontology (MLLO) to capture such data in a knowledge graph.
The motivation for that is to not only make such data available in a standard
queryable representation across different ML software, but to also be able to
connect it with other domain knowledge. We will introduce MLLO at the high-
level and outline basic and advanced use case scenarios, in which the data, the
MLLO, and domain knowledge may be used to improve the development and
usage of ML models and associated datasets. We then describe future work we
are undergoing to demonstrate the hypothesis.

Keywords: Machine Learning, Ontology, Knowledge Graph, Machine


Learning Lifecycle Management, Biomanufacturing

1 Introduction

Machine Learning (ML) is emerging as a pivotal asset for companies because it


enhances decision-making, enables new insights from a wide array of datasets, and
automates business processes [1]. Currently, machine learning adoption is rapidly
rising, with a nearly 40% compound annual growth rate over the next decade. This rapid
rise entails companies being flooded with ML models developed for diverse objectives,
utilizing various software and numerous datasets [2]. However, this proliferation of ML
models will also pose unique challenges, including the need for robust governance
frameworks, enhancing model interpretability, and addressing potential biases inherent
in the data used for training [3]. In addition to the general challenges, applying machine
learning in particular industries may lead to further impediments. For example, in
highly regulated manufacturing industries (e.g., biopharmaceutical manufacturing), the
challenges of data provenance, operation transparency, and result reproducibility and
M. Drobnjakovic, P. Charoenwut, A. Nikolov, H. Oh, and B. Kulvatunyou

verifiability need to be primarily addressed [4]. In this context, machine learning


lifecycle management, encompassing stages from data collection and preprocessing to
model deployment and maintenance becomes crucial.
Companies must establish comprehensive processes and tools to manage the
lifecycle efficiently, ensuring proper version control, monitoring model performance
over time, and facilitating seamless collaboration among data scientists, engineers, and
domain experts [5]. In other words, while the rise of ML offers immense opportunities,
navigating the complexities of ML lifecycle management becomes imperative for
maximizing its benefits while mitigating associated risks.
Nevertheless, there are limited solutions to how ML Lifecycle management is
conducted and how the associated metadata is treated. Companies predominantly rely
on guidelines and utilize tools that cover the metadata of a particular set of frameworks
and algorithms or are developed for a limited number of phases of the ML lifecycle.
Several solutions also provide a certain degree of entire ML lifecycle metadata
management capability but are lacking in granularity, extensibility, and domain-
specific contextualizations [6,7]. Finally, the implemented solutions differ in the
underlying metadata structuring, which may pose problems for cross-software
communication and migration.
In this paper, we present our current research on developing an ontology-based
solution to capture and represent the metadata of the entire ML lifecycle. The
foundation for our solution is our newly developed ontology called the Machine
Learning Lifecycle Ontology (MLLO). The motivation for developing and utilizing an
ontology as the backbone is the possibility to create a standard queryable representation
that is 1) ML software agnostic, 2) supporting different levels of granularity, and 3)
relatively easily extendible for a particular application need. Moreover, an ontology can
permit connection to domain-specific contextualization, facilitating collaboration
between domain and ML experts and potentially providing new insights into ML model
development and deployment [8,9].
Therefore, the rest of the paper is organized as follows: First, ML metadata capture
solutions, as well as ML ontologies and standards, are reviewed. Next, basic and
advanced use case scenarios are outlined in which the data, the MLLO, and domain
knowledge may be used to improve the development and usage of ML models and
associated datasets. Afterwards, an overview of the MLLO is given. Finally,
undergoing and future work for demonstrating the hypothesis is described.

2 Previous work

Machine Learning lifecycle typically involves four stages: Identifying Requirements,


Data processing, Model Development, and Deployment. During each stage various
artifacts are created, utilized, and revised, such as datasets, feature sets,
hyperparameters, model parameters, and evaluation metrics [1]. As ML projects scale
in size and complexity, managing ML artifacts and streamlining the ML workflow
becomes critical for project success. Because of the complexity involved, MLOps
platforms have been developed. MLOps is a practice that applies DevOps principles to
machine learning development, aiming to eliminate bottlenecks and roadblocks
between the experimental/development phases and the deployment phase [2]. The
resulting MLOps platforms increase developer collaboration, accelerate model
deployment, and strive to enhance monitoring of models, code, and data. Numerous
proprietary and open-source ML Ops platforms exist, varying in the application scope,
visualization and artifact logging capability, and model coverage (e.g., Neptune.ai,
Azure MLOps, Google Vertex, TFX and MLFlow) [1].
One of the pillars of each MLOps platform is the capture and storage of metadata.
However, due to the lack of a unified standard for representing and structuring ML
metadata, there is limited compatibility between different MLOps platforms, and ML
frameworks [3, 4]. When deploying machine learning models, it is also crucial to
consider the various environments in which the models will be used. In this context,
cross-platform compatibility and meeting specific domain requirements are essential.
For example, in manufacturing, data collection from machines (e.g., temperature,
vibration, and flow rate) significantly impacts the development and deployment of ML-
based control models. Likewise, in the logistics domain, models used for tasks like
route optimization and demand forecasting require complex integration with real-time
tracking systems and supply chain management software. In other words, to satisfy
diverse development and deployment requirements, the ML metadata model should
provide 1) cross-platform compatibility, 2) flexibility in representation granularity, 3)
coverage throughout the ML lifecycle, and 4) permit connectivity to domain-specific
contexts. Even though a unified standard does not exist, there were several strides made
towards standardizing ML terminology and providing data and metadata models for
particular aspects of ML. The rest of this section will provide an overview of these
advances, as well as elaborate on how ontologies could tackle requirements described
above, and review the current ML Ontological landscape.

2.1 Machine Learning Standards

Standardizing data models for exchanging machine learning models in a framework-


neutral manner has been a focal point in the machine learning community. The
motivation behind this focus stems from the difficulties and risks associated with
efficient and scalable deployment of models in production environments, where the
models are integrated with existing systems for real-time use [5]. Previously, to address
this issue, one approach was to rewrite models in other programming languages.
However, model rewriting is time-consuming and error-prone, especially if the
personnel responsible for model development and deployment are different [6]. The
alternative to code rewriting is a software-neutral model exchange exemplified in two
open standards: Predictive Model Markup Language (PMML) and Open Neural
Network Exchange (ONNX). While both aim to provide a standardized way of sharing
models across different platforms and tools, they differ significantly in their focus and
complexity. ONNX specializes in representing deep learning models using a detailed,
graph-based approach, which makes it ideal for complex architectures. On the other
hand, PMML primarily targets conventional predictive models like linear regression,
decision trees, and feedforward neural networks and uses a simpler, high-level, XML-
M. Drobnjakovic, P. Charoenwut, A. Nikolov, H. Oh, and B. Kulvatunyou

based format [7]. However, the PMML model's limitations in supporting cutting-edge
and customized models have been exposed due to its complexity today since its
publication in 1998 as well as the inherent challenges of XML to be edited manually
and debugged [8]. Regardless, of the two data standards in question their objective also
implies that they have limited scopes for dealing with other aspects such as logging
model artifacts post deployment and efficient capturing of metadata during model
development and data preprocessing stages.
ISO has also attempted to standardize terminology surrounding ML. The first
standard, ISO/IEC 22989:2022, aims to describe the key concepts surrounding ML
models. The second standard, ISO/IEC 23053:2022, provides the terminology
necessary to describe ML systems in general (e.g., training modalities, model
parametrization, evaluation). While the ISO standards provide value by providing
common terminology, they are not supported by a formal data model or logical
rigorousness, which may lead to implementation ambiguity. The next section will
describe how ontologies help establish terminological rigorousness and provide
connectivity across different areas.

2.2 Machine Learning Ontologies

An ontology can be defined as a controlled vocabulary consisting of a consensus-based


common set of terms that enable a standardized description of entities in a domain of
interest and their mutual interconnections. In addition to providing a common
vocabulary, ontologies are enriched with the logical representation of terms, enabling
machine understanding, consistency checking, and inference. This allows ontologies to
serve as a backbone to connect various data sources while providing logical rigor to
ensure the consistency of structure and metadata representation. The value of ontologies
has already been demonstrated in domains such as manufacturing and biology.
Nevertheless, the number of ontologies that deal with ML is limited.
An example of ontology creation for knowledge representation in the ML domain
by using Protégé 5 is shown by Braga et al. [ref]. The authors showed that the ontology
could serve as the explicit knowledge repository in the ML domain, with specifications
that enable the repository extension by involving the reasoning engine to obtain implicit
knowledge. The created knowledge base is called Machine Learning Ontology
(MLOnto) and is understandable for human- and software-based agents. The ontology
covers various application objectives, ML framework models, and training types.
However, it lacks the connections needed to link training instances with specific models
and application objectives. Additionally, it does not include the necessary components
to represent various machine learning datasets, their characteristics and processing,
detailed model architecture, and model evaluation.
Publio et al. developed a top-level ontology named the ML-Schema for representing
and interchanging information on machine learning experiments, algorithms, and
datasets. ML-Schema was developed to circumvent problems that emerge due to using
different ML platforms, such as specific conceptualization or schema for representing
data and metadata. The ML-Schema relates can be mapped to several other ontologies
and ML vocabularies that cover certain particular portions of the ML pipeline or ML
application for DM (e.g., OntoDM-core ontology, DMOP ontology, the Exposé
ontology, and the MEX vocabulary). [ref] However, the ML-Schema addresses only
ML training, evaluation, and execution. As a stand-alone top-level ontology, the ML-
Schema lacks specific model types, hyperparameters, and other variables. Finally, it
does not have sufficient semantics to track the model over its entire lifecycle, such as,
variations in performance during deployment.
An ontology was also developed as a theoretical foundation for visual analytics (VA)
assisted ML. In a study by Sacha et al. a VIS4ML ontology is proposed for the
description of concepts and relations in VA and to help by detecting gaps in ML
processes. [7] One of the ways in which the VIS4ML ontology could aid and enhance
ML workflows is by indicating where visualization is needed in a specific ML
workflow. Some commonly used terms for building VIS4ML ontology are entities,
data, formal models, knowledge, and process transforming entities between various
stages (e.g., data mining, ML, visualization, data mining, human knowledge). The
classes and their ontological relations are designed to explicitly define the predecessor-
successor and action-actor relationships within different workflows. [7] Nevertheless,
VIS4ML is oriented only towards a specific visualization use case, thereby lacking the
needed generalizability and full lifecycle coverage.
Finally, Svetashova et al. demonstrated the value of ontologies for applying ML in
manufacturing. Namely, to address the primary challenges of manufacturing ML
applications (communication, data integration, and generalizability), the authors
developed a "fit for purpose" ML ontology and surrounding software to support
ontology utility. This system (called SemML system) was evaluated through the Bosch
use case of electric resistance welding. SemML system introduces the four semantic
components (ontology extender, domain knowledge annotator, ML annotator, and
ontology interpreter) into the conventional ML pipeline relying on ontologies, ontology
templates, and reasoning. The system consists of three layers: the Industry Application
Layer, the System Layer, and the Data and Knowledge Layer. The Industry Application
Layer is for welding monitoring, diagnostic, and analysis. The System Layer has ML
modules with semantic models, while the Data and Knowledge Layer contains
ontologies, ontology templates, and data ML models. [6] The system addresses
communication, data integration, and generalizability challenges through its semantic
components ontology extender, interpreter, domain knowledge annotator, and ML
annotator. Although the implemented solution seems promising, it doesn't provide
enough details on the underlying ontology structure and seems tailored toward a
particular industry use case.

3 Use Cases

Use cases are one of the fundamental steps in ontology validation. Namely, use cases
test the ontology for real-world applicability, help identify inconsistencies, and assess
the completeness of the ontology concerning a particular application area.
In this section, we will present two use cases instrumental to demonstrate the
applicability and value of MLLO. The first use case revolves around capturing the key
M. Drobnjakovic, P. Charoenwut, A. Nikolov, H. Oh, and B. Kulvatunyou

aspects of ML models and data preprocessing. Moreover, the first use case aims to
validate MLLO capability to assist with particular tasks during development and
deployment. Later in this section, we will introduce the use case focusing on the
biopharmaceutical industry. We will explain how MLLO could help in regulatory
compliance and improve model development in combination with domain-specific
ontologies.

3.1 Basic use case

The aim of the basic use case is to validate that the MLLO ontology has sufficient
coverage to capture the architecture, and input requirements of ML models as well as
the data processing steps utilized prior to ML model training or execution. We also aim
to assess the MLLO's capability to capture various ML training runs, and support
analysis of training configuration impact on different models performance. Finally,
with the basic use case, we aim to demonstrate how MLLO can be utilized to track
model performance on datasets that have different quality (e.g., noise level), which
emulates scenarios where, during deployment, the data captured might vary due to
instrument deterioration, physical phenomena or change in the measurement
capabilities of the instrument utilized.
To achieve our objectives, we are using several models developed for the MNIST
(Mixed National Institute of Standards and Technology) dataset. The dataset contains
70,000 pre-labeled grayscale images of handwritten digits with 28x28 pixels. [9]. We
have decided to use this dataset because it is well-understood and easily accessible.
Additionally, as the MNIST dataset is often used as a benchmark for machine learning,
numerous well-documented MNIST-trained models are publicly available (e.g.,
Kaggle). In addition to the standard MNIST dataset, we have created several datasets
that contain varying degrees of Gaussian and Poisson noise (Fig. 1).
We used two neural network models. The first model, a convolutional neural
network, comprises eight different layers, namely Convolutional Layer 1, Average
Pooling Layer 1, Convolutional Layer 2, Average Pooling Layer 2, Convolutional
Layer 3, Flattening Layer, Fully Connected Layer 1, and Fully Connected Layer 2.
Rectified linear units (ReLU) have been employed as the activation function for every
layer except for the final layer, where softmax has been used. The other one, the multi-
layer perceptron, is composed of three dense layers. ReLU has been also used for the
first two layers, while softmax has been utilized for the last layer.
All the models have been trained and validated with the original MNIST dataset and
subsequently tested with the noisy datasets. Hyperparameter tuning was also conducted
for each model.
The implementation of all models have been done in Python. We have chosen
Python for initial validation, due to it being a dominant language used in machine
learning. It has gained popularity because of its simplicity, extensive library ecosystem,
and strong community support. As such, popular libraries like TensorFlow, Keras,
PyTorch, and Scikitlearn make data manipulation, analysis, and model deployment
much easier.
Original Data Original Data + Gaussian Original Data + Gaussian noise
noise with S.D. of 16 with S.D. of 48

Original Data + Gaussian Original Data + Poisson Original Data + Poisson noise +
noise with S.D. of 96 noise Gaussian noise with S.D. of 48
Fig. 1. Example MNIST data with varying degrees of noise

3.2 Advanced use case

The Biopharmaceutical industry is looking to increase the utilization of ML-based


(entirely data-driven or hybrid) models due to its ongoing shift towards Industry 4.0
and continuous manufacturing. However, the adoption of ML in the biopharmaceutical
sector comes with certain impediments due to the industry’s highly regulated nature
and the complex nature of bioprocesses, and in some cases disparate datasets.
MLLO could potentially provide a standardized framework for delineating ML
lifecycle stages and their interrelationships, which can assist with regulatory
compliance. Additionally, MLLO could represent the characteristics and uncertainties
of data collected during manufacturing and establish connections with model
performance. Manufacturing data could then be further contextualized with a domain-
specific ontology, which can encode domain expertise and provide insight into the
origin of the data, expected data trends, correlations, interdependencies, and
uncertainties. We expect these additional connections and insights from both the
domain and ML perspective can accelerate model development, ease the discovery of
where particular ML pipeline components can be reused, and enable new ways to
embed biomanufacturing domain knowledge into ML models or feature selection and
engineering.
To test the claims stated above, we will utilize an ML-based control model built for
the yeast fermentation process. The model currently embeds domain knowledge in the
form of mechanistic equations and through non-domain-informed Bayesian priors. We
will encode domain expert knowledge not captured with mechanistic equations through
ontological axioms and cross-instance relations. The ontology-based domain
knowledge will be combined with ontology-encoded data characteristics (e.g., data
collinearity) and uncertainties to 1) drive feature selection and 2) construct Bayesian
priors. The entire pipeline associated with training, selecting and testing the optimal
model will also be captured by MLLO. A successful outcome of this case study is to
M. Drobnjakovic, P. Charoenwut, A. Nikolov, H. Oh, and B. Kulvatunyou

demonstrate that 1) ontologically encoded knowledge that combines both the “domain
and ML expert” perspective can increase the accuracy of a model utilized in biopharma
and 2) that model development and selection of the optimal model can be accelerated
by using MLLO.

4 Overview of the Machine Learning Lifecycle Ontology


(MLLO)

The MLLO development process is guided by the hub-and-spokes principles, with its
foundation provided by BFO and IOF-Core ontologies. Both top-down and bottom-up
methodologies are utilized to construct the ontology. Existing ML standards (e.g.,
PMML, ONNX and ISO/IEC 22989:2022) and ontologies (e.g., STATO, ML-Schema)
are leveraged, with constructs adapted and reused where applicable. Throughout the
development, competency questions (Table 1) derived from real-world ML challenges
and use case scenarios serve as pivotal guides, ensuring that the ontology addresses
crucial industry needs.

Table 1. Example competency questions and focus areas

Focus Area Example Competency Questions

Data traceability CQ1: What is the lineage (e.g., where it was


generated) and value type (e.g. is it measured or
simulated) of a piece of data?
CQ2: For which model and for what purpose (e.g.,
training, retraining, production execution) was a
particular piece of data used for?
CQ3: How was a particular dataset processed prior to
being used for model training, execution or
evaluation?

Model performance CQ4: How do different hyperparameter


evaluation and configurations impact model performance?
optimization CQ5: What is the evaluation metric used to assess
model performance?

Model robustness CQ6: What is the robustness of the model to different


noise types?
CQ7: How does dataset variability impact the model
during various phases (e.g., training and execution)?
Model applicability CQ8: For which objectives was a model applied?
CQ9: What are the model input requirements (e.g.,
expected data shape and data type, heteroskedasticity
of data)
CQ10: What are the data characteristics of the raw
and preprocessed data (e.g., after feature
engineering) the model was applied on?

The resulting MLLO ontology contains x classes and y object properties and is
composed of three integral areas (Fig. 2).

Fig. 2. Representation of the connections between three integral areas of the MLLO

The first area of the ontology is focused on ML algorithm contextualization. It enables


the representation of algorithms central to machine learning, including ML models,
training algorithms, and data processing algorithms. It also facilitates the representation
of their attributes, such as model explainability and linearity and algorithm utilization
objectives. The area also permits capturing versioning, implementation-independent
model structure as well as the components that are implementation specific.
The second area of ontology focuses on contextualizing data produced or utilized in
various ML lifecycle stages. It enables the depiction of domain-collected data as
features, encompassing various types such as categorical, time series, and numerical
attributes. It also enables understanding the composition of various ML datasets, such
as raw, training, validation, and testing datasets. Additionally, it permits the
representation of dataset and feature characteristics, such as skewness and
multicollinearity.
The final area depicts the execution and deployment of algorithms detailing their
operational precedence, such as ordering steps within a particular ML pipeline.
Likewise, it facilitates linking specific algorithm configurations, including parameters
and hyperparameters, to a given execution instance.
Thus, utilizing the three areas of MLLO provides the user a basis for capturing ML
metadata throughout the various stages of the ML lifecycle, including different model
experimentation, model maturity, data preprocessing, and data characteristics
requirements for a given utilization objective. Simultaneously, by being based on the
M. Drobnjakovic, P. Charoenwut, A. Nikolov, H. Oh, and B. Kulvatunyou

BFO/IOF ontologies, data produced by or utilized in the ML pipeline can be


contextualized within a domain of interest. This enables additional insight into the data
from the “domain of interest lens” as well as the representation of input data origination
and the subsequent model output data utilization.

5 Validation of the Basic Use Case

In this section, we delve into the validation process of our basic use case, aiming to
achieve objectives specified in section 3.1 by using MLLO. First we will describe the
methodology utilized to extract and ingest the metadata into MLLO as well as the
competency question driven use case validation. Next, the results of SPARQL queries
which provide answers to competency questions will be provided and analyzed.

5.1 Methodology
The metadata associated with trained models has been extracted and saved into JSON
that conforms to the JSON schema derived from MLLO. The metadata extraction was
done by using our in-house Python script, which extracts the metadata by using a
combination of Python frameworks built-in save methods as well as user-given
prompts. For example, in the case of TensorFlow/Keras, the get_config() method
was used to retrieve model’s layers metadata, which consists of layer name, layer type,
layer configurations, activation function, and initializer. Additionally, optimization
configurations are obtained using the get_compile_config() method. Metadata
elements that can’t be extracted from the default save model method are hardcoded.
These include model_name, create_for_project, hyperparameters and
evaluation_score.
The JSONs were converted into JSON-LD and mapped to MLLO by using SPARQL
CONSTRUCT query. Elements pertaining to data preprocessing were entered manually
into the knowledge graph. The knowledge graph was validated for consistency and
coherency by using Hermit 1.4.3. Additionally, the knowledge graph was assessed
manually to ensure that all the metadata was properly transferred and that any
interrelations are represented. Fig. 3 shows the visualization of the CNN model
architecture in MLLO. In the figure, it can be seen that the model has the correct type
and that proper layer ordering (determined by indices) is preserved. Also, the
connections between activation functions and a particular layer are present. It should
be noted that while not explicitly shown in the figure, the ontology managed to capture
correct layer dimensions and all the configuration variables (e.g., padding) pertaining
to a particular layer.
Finally, the resulting knowledge graph was used to answer the competency questions
(CQ4, CQ6, CQ3, and CQ9 from Table 1). The competency questions were chosen to
reflect objectives identified at the beginning of section 3.1.
5.2 Validation Results

We have assessed the effectiveness of MLLO by asking specific competency questions,


which demonstrate how MLLO can be practically applied to organize and analyze
metadata related to machine learning models. In this section, we will present the
answers obtained from the knowledge graph. These results showcase MLLO's
capabilities in tracking data pre-processing and input requirements (CQ3 and CQ9),
Additionally, MLLO can compare model performance with changes in the training
environments and testing datasets (CQ4 and CQ6), allowing users to identify the impact
of these factors on overall performance.
CQ3 was answered via a query that retrieves data processing pipelines that 1) have
the selected dataset as inputs and 2) produced artifacts that were used in model
execution. Next, the query retrieves the types of individual operations that belong to a
particular data processing pipeline and orders them with respect to their operational
precedence. The results of the query are given in Table 2. The results show that
mnist_nonoise was used for model evaluation. Also, in the case of both models,
prior to execution, the same data processing operations were used, although in different
orders. Thus, the MLLO can capture all the different data processing steps and the order
in which they were executed in each case. Such information can help pinpoint
discrepancies in data processing procedures leading to model executions. This can lead
to further investigation to determine if the difference in data processing has led to any
model performance shift. Finally, by having the complete ordering of different data
processing pipelines at their fingertips, users can find potential overlaps, combine their
elements for further performance optimization as well as identify a pipeline they might
use for a different scenario.

Table 2. Result of SPARQL query based on CQ3


model dataproc_pipeli dataset data_utilization_p pipeline_st
ne urpose eps
1 ex:ConvNet1 dp:DataPrepPipe ex:mnist_non “model evaluation” “data
line1 oise reshaping
operation
‘precedes’
feature
scaling
operation
‘precedes’
type
casting
operation”
2 exmlp:MultiLayerPerc dp:DataPrepPipe ex:mnist_non “model evaluation” “feature
eptron1 line2 oise scaling
operation
‘precedes’
type
casting
operation
‘precedes’
data
M. Drobnjakovic, P. Charoenwut, A. Nikolov, H. Oh, and B. Kulvatunyou

reshaping
operation”

CQ4 was addressed through the utilization of a query that finds hyperparameters which
differ in value across different training instances and the trained model's performance
on the test dataset (original MNIST test dataset). The query results were then used to
construct a scatter plot Fig. 4). The scatter plot suggests that the CNN model actually
performs slightly better with a smaller epoch number and batch size. In the case of the
MLP model, the best performance can be achieved by reducing the batch size, while
keeping the number of epochs the same. It also demonstrates that MLLO is capable of
establishing the connection between a performance metric (in this case, classification
accuracy) and the variation in hyperparameter relative to some baseline run (e.g., batch
size 128 and epoch number 10). This capability of MLLO can make performance
comparisons during hyperparameter tuning more straightforward and streamlined.
Fig. 3. Visualization of the nodes representing the architecture of the Convolutional Neural
Network described in the basic use case section. Indices in the IRI of the parameter layer
correspond to index instances connected to the nodes, which were omitted to preserve image
clarity.

Fig. 4. Classification accuracy of the CNN model and MLP model depending on the training
hyperparameters. The cross indicates the baseline run.

CQ6 was tackled by creating a query that retrieves all the datasets used as inputs to
the execution of a particular trained model. Within the query, the classification accuracy
on the test dataset was explicitly marked as a baseline. The classification accuracy of
the model execution on other datasets (in this case, classified as production datasets)
M. Drobnjakovic, P. Charoenwut, A. Nikolov, H. Oh, and B. Kulvatunyou

was compared to the baseline to estimate model robustness - change in model accuracy
with respect to different noise types and noise degrees. The results of the query were
then plotted as bar plots (Fig. 5). The results show that the CNN model performance is
robust to Poisson noise and a combination of Poisson and medium Gaussian noise. The
only significant performance change was observed when applying the model to highly
noisy Gaussian datasets. In the case of the MLP model performance significantly drops
for all except the lowest Gaussian amount of noise, indicating that the model is not
robust to any noise fluctuation in data. The findings indicate that MLLO is capable of
comparing the performance of various models on a specific dataset and linking it to the
dataset’s features, such as the type and degree of noise. This could offer valuable
insights to machine learning experts regarding which model to use in a specific setting
and help pinpoint potential reasons if model performance changes during production.

Fig. 5. Robustness of the CNN Model and MLP Model to varying degrees of noise and
different noise types. The green bar represents the dataset with no noise.

CQ9 was answered via a query that retrieves the input requirements associated with a
model and any associated data elements with them. The results are displayed in Table
3. The results show that for each model the expected data type and input dimension is
specified. While both models expect images to be encoded as “Float32”, the MLP also
requires the 28x28 images to be flattened. This kind of information can help to
determine what kind of preprocessing a data source might need before being utilized
with a particular model. It is worth mentioning that the requirements specified here are
relatively simple. However, the MLLO can also capture model assumptions as
requirements, which can potentially be used to infer if the characteristics of datasets
satisfy the assumptions of a particular model. The availability of such information
could accelerate model and feature selection decisions for a particular task or ease
identifying the required data preprocessing (e.g., feature engineering) steps. The full
extent of these capabilities will be explored in the future.
Table 3. Result of SPARQL query based on CQ9
model input_requirement associated_data
1 ex:ConvNet1 ex:InputRequirements0 “specifies data type: Float32 ;
specifies dimension shape: [28, 28,
1]”
2 ex- ex- “specifies data type: Float32 ;
mlp:MultiLayerPerceptron1 mlp:InputRequirements0 specifies dimension shape: [784]”

6 Conclusions and Future Work

As machine learning (ML) continues to assert its significance in various industries,


effective ML lifecycle management emerges as a vital necessity for companies aiming
to harness its benefits while mitigating associated risks. One of the foundations of any
lifecycle management framework is effective management and representation of
metadata. However, current solutions typically lack full lifecycle coverage and are
oriented towards a limited number of models or frameworks. Moreover, metadata
management solutions might not be mutually interoperable and have limitations in
granularity, extensibility, and domain-specific contextualization.
This paper introduces the MLLO as a means for standardized, extensible, and
framework-neutral metadata representation across the ML lifecycle. With the basic use
case, we demonstrated its practical utility in organizing and analyzing metadata for
machine learning models. Through our assessment, we've observed MLLO's
effectiveness in tracking data preprocessing steps and capturing model architectures,
comparing model performance across different environments and datasets, and
identifying input requirements. These capabilities enhance the model development
process, enabling users to pinpoint discrepancies in data processing, understand the
underlying model structure, and make informed decisions during hyperparameter
tuning. MLLO also enables the establishment of vital links between dataset
characteristics and model performance, which can provide insights into model
selection, performance drift during production, and the need for retraining. While these
findings highlight the potential of MLLO in enhancing machine learning workflows,
further exploration of its capabilities is essential for future research.
Namely, as outlined in the advanced use case, the second key objective of MLLO is
providing fundamental links between domain context and knowledge and the
underpinnings of ML. To test our hypothesis, we are focusing on an ML-based control
model for yeast fermentation processes. By incorporating domain knowledge through
ontological axioms and cross-instance relations, along with ontology-encoded data
characteristics and uncertainties, we aim to demonstrate the efficacy of ontologically
encoded knowledge in improving model accuracy and expediting model development.
MLLO aims to promote interoperability among various platforms, which is crucial
for fostering collaboration across diverse ML ecosystems. While the ontology itself is
already programming method agnostic, our code for JSON-based metadata extraction
currently only supports Python. MATLAB is a popular numerical computing tool that
is commonly used in academia and industry, while R is used in statistical analysis and
data visualization, serving as a go-to platform for statisticians and data scientists. By
M. Drobnjakovic, P. Charoenwut, A. Nikolov, H. Oh, and B. Kulvatunyou

adding support for MATLAB and R, it can be expanded to include a wide range of ML
models developed across various programming languages used by different domain
experts. This expansion can be achieved by developing language-specific adapters or
wrappers that extract metadata in a standardized format compatible with the existing
JSON-based solution. The benefits are to enhance reproducibility, and streamline
knowledge transfer across communities.
To fully leverage the capabilities of the MLLO, it will also be necessary to have a
comprehensive platform as a tool for evaluating ML models. Therefore, we are
developing a practical MLLO-based application termed the MLLO Editor. The MLLO
Editor's design is centered around seamlessly integrating features aimed at facilitating
the input, organization, and analysis of information pertaining to ML models and
dataset characteristics. It will allow users to capture relevant information according to
the MLLO ontology. At its core, the editor offers a user-friendly interface that
simplifies the process of capturing details about machine learning models, including
their architectures, hyperparameters, and training configurations. It facilitates
annotation and incorporation of dataset characteristics, enabling users to establish
explicit relationships between their models and the data on which they were trained.
Furthermore, the editor offers robust tools and tailored visualizations for conducting
comparative analyses, enabling users to compare various model information based on
performance metrics, hyperparameters, or other relevant factors. Finally, the MLLO
Editor includes features such as versioning and history tracking, enabling users to
maintain a comprehensive record of changes made to their models and associated
information over time.

7 References

1. Schlegel, M. and K.-U. Sattler, Management of machine learning lifecycle artifacts: A


survey. ACM SIGMOD Record, 2023. 51(4): p. 18-35.
2. Subramanya, R., S. Sierla, and V. Vyatkin, From DevOps to MLOps: Overview and
Application to Electricity Market Forecasting. Applied Sciences, 2022. 12(19): p.
9851.
3. Urias, I. and R. Rossi, Evaluation of Frameworks for MLOps and Microservices. EAI
Endorsed Transactions on Smart Cities, 2023. 7(3).
4. Lima, A., L. Monteiro, and A.P. Furtado, MLOps: Practices, Maturity Models, Roles,
Tools, and Challenges-A Systematic Literature Review. ICEIS (1), 2022: p. 308-320.
5. Kervizic, J. Overview of the different approaches to putting Machine Learning (ML)
models in production. 2019 [cited 2024 2024 March 8]; Available from:
https://medium.com/analytics-and-data/overview-of-the-different-approaches-to-
putting-machinelearning-ml-models-in-production-c699b34abf86.
6. Tierney, B. ONNX for exchanging Machine Learning Models. [cited 2024 2024 March
8]; Available from: https://oralytics.com/2020/10/05/onnx-for-exchanging-machine-
learning-models/.
7. Singh, S., N. Singh, and V. Singh. Comparative Analysis of Open Standards for
Machine Learning Model Deployments. in ICT Systems and Sustainability:
Proceedings of ICT4SD 2021, Volume 1. 2022. Springer.
8. AirbnbEng. Architecting a Machine Learning System for Risk. 2014; Available from:
https://medium.com/airbnb-engineering/architecting-a-machine-learning-system-for-
risk-941abbba5a60.
9. Lecun, Y., et al., Gradient-based learning applied to document recognition.
Proceedings of the IEEE, 1998. 86(11): p. 2278-2324.

You might also like