Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

A Foundation for Data Analytics in Manufacturing Using ibaPDA and Open-Source Machine

Learning Tools
Marcelo Murta Cardoso

Gerdau Special Steel


5225 Planters Road, Fort Smith, Arkansas, USA, 72916
(479) 649-4041
marcelo.cardoso@gerdau.com

Keywords: Digitalization, Industry 4.0, Machine Learning, Manufacturing, ibaPDA, R Language

INTRODUCTION

A great number of digital signals of process variables is already available in most steel plants as part of the PLC or computer-
based automation of their operations. While a data logging or historian system is also already implemented in most plants, it
is a common perception that the benefit or value of the process data may not be fully appreciated. The availability of open-
source software with state-of-the-art capabilities for statistical analyses and machine learning applications adds to the notion
that value creation may not be maximized.
To further advance digitalization, Gerdau Fort Smith has expanded use of an existing ibaPDA data historian system by an
upgrade to a data server version and integration with open-source statistical and machine learning tools. Upgraded with a
historical data server, the ibaPDA system provides the data lake foundation as well as the first level of analytics for data
cleaning, filtering, and process variable aggregation. Process events triggered upon data ingestion are used to automatically
execute statistical analyses or machine learning models implemented using open-source libraries for automatic notifications,
anomaly detection and process diagnostics.

Figure 1 – Analytics Platform and Machine Learning Project Steps


DATA HISTORIAN

There are dozens of different vendors of systems for the storage of signals with time series of values of process variables.
Known as data historians, process historians or enterprise historians, the historian software packages were first sold in the
mid-1980s for the storage of data for regulatory, reporting, asset availability and diagnostic purposes (1). Gerdau Fort Smith
has chosen the ibaPDA as the plant’s historian given several references at other Gerdau plants. Originally known for fast data
acquisition, the ibaPDA system has been the preferred solution for several OEMs of process lines that requires fast data
acquisition such as Hot Strip Mills. The first ibaPDA workstations were installed at Fort Smith in 2012.

The ibaPDA System at Fort Smith

Troubleshooting
Troubleshooting of equipment faults is the most traditional application and it is the main use of the ibaPDA system at
Gerdau. Video camera storage has added troubleshooting capability with visual inspection of synchronized process signals
and video. Figure 2 shows the main components of the ibaPDA system with the ibaAnalyzer as the main application for
visual analysis and preliminary statistical analysis of equipment or process issues.

Figure 2 - ibaPDA Troubleshooting Configuration

Data Ingestion, Visualization and Data Lake


The ibaPDA software supports direct connection to several PLCs through Ethernet communication or OPC interface (2).
Shorter sampling time may require however dedicated iba IO hardware which are not used at Fort Smith. A data lake for long
term storage of raw data is provided with the recent upgrade at Fort Smith to a dedicated server computer with the ibaHD and
ibaCapture server applications.

ANALYTICS PLATFORM

Once raw data is collected, a typical data science project encompasses the steps shown in the horizontal bar of Figure 1. After
data ingestion, Exploratory Data Analyses can be carried out to detect the need for further processing, Data Cleaning and
filtering. If a set of valid data, pertinent to an analysis question, is retrieved from the historical data base of the ibaHD server,
a statistical or machine learning Model can be training for insights or prediction in response to an analysis question. Once an
acceptable model performance is achieved, Deployment of the model into a production environment can be implemented (3).

Exploratory Data Analysis, Data Cleaning, Filtering and Aggregation


The ibaAnalyzer is part of the ibaPDA system and it is a license free software for analysis of stored data. Unstructured
measurement signals can be processed, and new signals created using a set of mathematical, logical, and statistical functions
for data cleaning and filtering. The ibaAnalyzer can generate reports of a time frame related to a signal condition during
offline analyses or when executed automatically. Raw data can be aggregated by statistical summaries on a given time base
or by summaries of specific events defined in the ibaPDA.

Data Export
A paid add-on feature enables exporting of raw or aggregated data of ibaAnalyzer reports for use in Exploratory Data
Analysis or model training. Automatic report generation or data export can also be triggered by a real time event using the
ibaDatCoordinator service. The ibaDatCoordinator is a powerful component that enables the ibaPDA system to be the
central element of the analytics platform. This is due to the ability of the ibaDatCoordinator to also run “*.bat” batch files (4)
with commands of the command-line interface. The batch files are executed by the ibaDatCoordinator upon a time-based or
event-based triggered situation. The ibaDatCoordinator and the execution of a “*.bat” script file are shown on the diagram of
Figure 1. Direct API, Application Programming Interface, for queries of the historical database can also be developed for data
exporting from the ibaHD server.

Open-Source Machine Learning Tools


R and Python are the most popular open-source programming languages for the development of Machine Learning
applications. RStudio offers an interactive user-friendly development environment which makes it easier transitioning from
Excel based analytics. H2O AutoML is an open-source Automatic Machine Learning environment which can provide
automatic model building with direct interface to both R and Python development environments. H2O AutoML automatically
builds a large number of models with state-of-the-art machine learning algorithms and then determine the best model without
requiring prior knowledge or effort of the data analysist (5). RStudio with H2O AutoML have been the preferred development
environment for the initial applications in Fort Smith.

Model Execution
Real time events can be set up in the ibaDatCoordinator service to trigger the execution of jobs which carries out different
tasks. A job task can generate a report, export data, run a batch file, among other tasks. A previously trained model or
statistical analysis can be therefore automatically executed as a script or “*.exe” executable file in a batch file of a job task
triggered by an ibaDatCoordinator event. ibaDatCoordinator jobs can also be scheduled to be executed on a given recurrent
time basis.

Figure 3 - Application Example

APPLICATION EXAMPLE

The bag houses of the melt-shop in Fort Smith have a Continuous Emission Monitor System to monitor NOx generated by
several possible sources including 4 ladle preheaters, burners, and oxygen lances of 2 EAFs. The CEMS equipment would
alarm for possible trend to exceed limit for NOx emission, but it is oftentimes difficult to find the source of the increased
NOx generation. Signals of process variables that could potentially impact NOx generation are available at the iba system but
difficult to interpret for correlation investigation.

Analysis Question: Which equipment should be checked for increased NOx generation?
Visual inspection of the trends of process variables and CEMS readings does not provide quantitative results. This is a rather
common situation when data is available but visual inspection alone is not enough to the required analysis question. In many
cases, we have the data, but the underlining information or the value in the data is not directly determined.

Statistical Analysis Deployment


Upon alarming conditions, the analytics platform automatically executes the tasks listed below and depicted at Figure 3:
1. Sends out a notification email
2. Generates a report with process variables prior to the event
3. Extracts pertinent ibaPDA data of oxygen usage at various melt-shop equipment
4. Executes a R language script for the statistical analysis of the possible sources of NOx emissions based on the extracted
data
5. Distribute a report with the result of the analysis at a SharePoint folder.

SUMMARY

Pros
1. Simplifies development of Machine Learning projects as the ibaPDA system provides for the first phases of the analytics
process while also providing automatic data extracting for model development as well as model deployment
2. Streamlines development of Proof-of-Concept projects as it minimizes modification of existing process control
applications
3. Low cost as the ibaPDA usually originally justified for troubleshooting functionality while license free open-source
software provide state of the art machine learning capability
4. Several possible applications for process diagnostics and optimization
5. Narrows the bridge between process engineers and automation engineers

Cons
1. Processing time doesn’t allow for applications that requires fast or ‘real-time’ response
2. Contingent on IT infrastructure reliability for alarming using emails, smartphone notifications, SharePoint, or webpage
reports
3. Still requires development of interfaces for alarming on existing SCADA or level 1 Human Machine Interfaces, HMIs
4. Data acquisition needs to be stopped for modifications in the I/O configuration

CONCLUSION

The ability to trigger the execution of “*.bat” batch files is a powerful feature available at the ibaPDA system that enables a
joint platform with open-source machine learning tools. The analytics platform provides the foundation for all steps in the
development of machine learning projects. It also streamlines and simplifies prototyping Proof of Concept projects given that
statistical analysis and machine learning models can be created and deployed into an operational environment as originally
developed by the process engineer or data scientist. Although not recommended for ‘real-time’ control applications, the
presented platform suits well to optimization and diagnostics analytics of either industrial processes or equipment reliability
without requiring significant software programming development. By contrast, deployment of ‘real-time’ statistical analyses
or machine learning models would require development of API interfaces and programming of new or modified run time
applications.

REFERENCES

1. Michael Risse, “The Data Historian’s History Told”, Control Engineering, https://www.controleng.com/articles/the-
data-historians-history-told/
2. ibaPDA website, https://www.iba-ag.com/en/process-connectivity
3. Microsoft Documentation, “The Team Data Science Process lifecycle”, https://docs.microsoft.com/en-
us/azure/architecture/data-science-process/lifecycle
4. Batch file entry at Wikipedia, https://en.wikipedia.org/wiki/Batch_file
5. H2O.ai Documentation, “H2O AutoML Tutorial”, https://docs.h2o.ai/h2o-tutorials/latest-stable/h2o-world-
2017/automl/index.html

You might also like