Paper Icic Reza 2019

Star Schema Implementation For Monitoring in
Data Quality Management Tool (A Case Study at A

Government Agency)
Mohammad Reza Effendy Tien Fabrianti Kusumasari Muhammad Azani Hasibuan
Information System Department Information System Department Information System Department
School of Industrial and System School of Industrial and System School of Industrial and System
Engineering, Telkom University Engineering, Telkom University Engineering, Telkom University
Bandung, Indonesia Bandung, Indonesia Bandung, Indonesia
mohrezaeffendy@gmail.com tienkusumasari@telkomuniversity.ac.id muhammadazani@telkomuniversity.ac.
id
Abstract— In the past few decades, the quality of data has become something important for managing data in an organization.
The primary objective of an organization to build data quality management includes improving the accuracy of the data, the
completeness of the data, the age of the data, and the reliability of the data. In developing quality management data, several strong
analysis steps are needed so that the tools can run well. Data taken from various applications in an organization is required to
maintain its quality. In general, data quality management has needs between tables. Star Schema as a model for representing
relationships between tables in the development of data monitoring, a star schema is needed to store the results of data that has
been tested from quality tools data. In this paper, we propose a star schema that is used as a basis for developing monitoring data
quality management tools that we have developed.
Keywords- monitoring, data quality, data quality management, dimensional modeling, star schema
I. INTRODUCTION
Organizations realize that the development of a quality information system has significant business benefits. Using data to
determine goals and decision making makes data as an essential component of information systems [ CITATION
Mac11 \l 1033 ]. Data governance as a part of corporate governance and information technology is defined as data
management techniques that produce data as organizational assets [ CITATION The09 \l 1033 ]. Data governance is a
set of processes, policies, standards, organizations, and technologies that ensure availability, accessibility, quality,
consistency, audit capability, and data security in an organization [ CITATION Pan10 \l 1033 ].
When an organization is seriously handling information system quality issues, they must manage the quality of data in their
organization [ CITATION Mac11 \l 1033 ]. Data quality plays a vital role in data governance, data quality as a way to build the
reputation of an organization, as a decision-making process, even operational process and transaction process. [ CITATION
Her07 \l 1033 ]. The primary purpose of the organization to handle data quality problems is to improve the accuracy of data,
data completeness, the age of data and data quality reliability. Managing data quality problems are one of the functions in data
governance, namely Data Quality Management (DQM)[ CITATION The09 \l 1033 ]. The DQM process occurs in several
stages starting from the scene of data production, data storage, data transfer, data sharing, and data usage [ CITATION Rou18 \l
1033 ]. DQM consists of several phases, namely data profiling, data cleansing, data quality assessment, and data quality
monitoring [ CITATION DAp15 \l 1033 ]. Data quality monitoring is also described as monitoring and ensuring data (static
and streaming) can adjust to business rules and can be used to determine the quality of data in an organization [ CITATION
Jud16 \l 1033 ].
In this study, a government agency in Indonesia. Some common mistakes in the data entry process, for example, are the
existence of empty data, the absence of standards in data formats, and the emergence of data redundancies that are still a
problem for the government agency. These errors also still cannot be monitored automatically by the application and if left
unchecked, these errors will become an ongoing problem because the data that is owned will continue to increase even more.
By looking at these conditions, a data quality monitoring application is needed to monitor data quality automatically. To
develop a data quality monitoring application, several steps are needed, including developing a star schema that can be used to
store data quality monitoring results. This study was made to design a star schema design for a data quality monitoring
application in the study case of the government agency.
Thus, this paper will implement a star schema to support data quality management tools to monitor the data quality result.
The systematics of writing this paper has five parts. Part 2 provides background on Data Quality Management, Database
Schema, and Dimensional Modeling. Part 3 introduces the method used for making monitoring on data quality management
tools. Part 4 monitoring process. Section 5 evaluation and discussion, and section 6 conclusions.
XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE

II. THEORY AND RELATED WORK
A. Data Quality Management (DQM)
The basis for research on data quality and DQM was put in the 1980s, when concepts and approaches to managing physical
goods, for the first time, were transferred to intangible management, such as data and information [ CITATION Jud16 \l 1033 ].
Data quality management is a set of methods and approaches to define, measure, analyze, improve, and control data quality to
maximizing the quality of data from an organization [ CITATION Cai15 \l 1033 ]. The results of DQM are providing data
quality assessment and monitoring the quality of data from an organization.
There are five dimensions of data quality, namely completeness, consistency, validity, accuracy, and integrity [ CITATION
Cai15 \l 1033 ]. First, completeness that ensures the availability of all necessary information. Second, a consistency that
provides uniformity of information in a table. Third, validity that guarantees the correctness of information. Fourth, accuracy
that ensures data can be trusted and by objects in the real world. Finally, integrity provides that data is interrelated, so reducing
the level of data duplication.
The Data Management Association says there are four processes in managing data quality are plan, deploy, monitor, and act
[ CITATION The09 \l 1033 ]. The plan is defined as a process for assessing the scope of data quality issues. At the deploy, is
the stage for analyzing data profiles. Stages of Monitoring are used to monitor data quality and measure data quality by
business rules. Act stage is the last stage, namely decision making to overcome and resolve data quality problems. Wang in his
research[ CITATION Wan97 \l 1033 ] said the implementation of the Total Data Quality Management (TDQM) framework
would achieve high data quality conditions with stages namely defining, measuring, analyzing, and improving.
The development and use of data quality operations are carried out by stages namely, developing data quality operating
procedures, correcting data quality errors, measuring and monitoring data quality, making data quality report based on
testing[ CITATION The09 \l 1033 ]. Some of the tools needed to develop data quality operations include profiling machines,
query tools, data quality rule templates, and quality check and audit [ CITATION The09 \l 1033 ].
One of the processes that play an essential role in managing data quality is profiling data. Data profiling is a set of activities
used to determine metadata in a dataset [ CITATION Abe16 \l 1033 ]. Based on [ CITATION Rou18 \l 1033 ] data profiling is
divided into three groups, namely single column profiling, multiple columns profiling, and dependencies. According to
DataFlux Corporation in the technique and processing of data profiling involves three categories of analytical methods, namely:
column discovery, structure discovery, and relationship discovery[ CITATION DAp15 \l 1033 ].
Data Quality Assessment is a stage defined in DQM which serves to verify the source, quantity, and impact of each item
that is not by the data quality rules that have been determined [ CITATION Dat16 \l 1033 ]. Data quality standards consist of
five dimensions, namely availability, usability, reliability, relevance, and presentation quality [ CITATION Exp14 \l 1033 ].
In managing data quality, there is a Data Quality Monitoring process. Data quality monitoring is a framework used to
control the quality of data in information systems continuously, for example by using metrics, reports, or by using profiling
data regularly[ CITATION DAp15 \l 1033 ]. Profiling results need to be monitored so that data remains in line with the
organization's business rules, besides that monitoring is also used for handling data, monitoring performance, and reporting
data quality can be centralized[ CITATION Jen95 \l 1033 ]. L. Tuura [ CITATION LTu10 \l 1033 ] mention that data quality
monitoring can be used as a detector, operating efficiency, and reliable data for analysis.
B. Database Schema
Based on research conducted by Blinn et al. [ CITATION Bli96 \l 1033 ] database schema is a consistent structure for
managing and managing information about a database. In 1975 ANSI has classified the database schema into three, namely
the conceptual schema, logical schema, and physical schema [ CITATION Kin14 \l 1033 ].
Some types of database schema namely network model, relational model, object-relational model, object role
model[ CITATION Ram10 \l 1033 ]. A network model is a database schema that is used to compile and represent objects and
their relationship to represent schemas is a graph, object types are nodes, and relations as arcs[ CITATION CJD98 \l 1033 ].
Edgar [ CITATION Cod69 \l 1033 ] represent relational models as approaches to managing data which represent all data into
tuples and group them into relations. The object-relational model is defined as the same as a relational model, but the
difference is that object, class, and inheritance are supported on database schemas in query languages[ CITATION Fra95 \l
1033 ]. Object role modeling provides a way to model, transform, and process queries from business information obtained to
be facts that make it easier for non-technical users to understand about this modeling [ CITATION Fra95 \l 1033 ]. In addition
to the types above, Dedic in his research [ CITATION Ded16 \l 1033 ] mentioning star schema is also included as a database
schema for data warehouse management.
C. Dimensional Modeling
Dimensional modeling is a part of the cycle of business dimensions that includes a series of methods, techniques, and
concepts in the use of data warehouses [ CITATION Con15 \l 1033 ]. Dimensional Modeling always uses two main ideas
[ CITATION RKi97 \l 1033 ] that are fact (measure) and dimension (context). Facts are the results of quantitative
measurements produced by business processes, while dimensions are groups that become descriptors of determining
facts[ CITATION Ded16 \l 1033 ].
One of dimensional modeling is star schema which is also considered the most effective as a dimensional model for
companies in applying the data warehouse concept [ CITATION Inm02 \l 1033 ]. A star schema consists of fact tables and
dimension tables [ CITATION Inm02 \l 1033 ]. Fact tables usually consist of two columns, namely a foreign key that has a
relation in the dimension table and column that measures things that contain numerical facts [23]. While the dimension table
is a table linked to the fact table, the dimension table stores and shows the data details as fact validation using detailed queries
[23]. Kimbal et al. [ CITATION RKi98 \l 1033 ] has provided detailed literature on dimensional modeling concepts where
one of the advantages of using dimensional modeling is the existence of a reliable decision support system process.
III. PROPOSED METHOD
The method used to design monitoring on data quality management tools can be seen in Fig.3. The research method is
divided into three stages, namely determining the design chart monitoring data quality, data schema design, and data schema
implementation.
The first step is to determine the design graph of monitoring. Determine the monitoring chart design to get an initial
picture of data monitoring. This stage produces a data monitoring system mockup design. In addition to charting this stage
also determines existing business rules to be used as guidelines for making designs. The second phase is to create a data
schema design that will be implemented in data quality monitoring. In the second stage, the selection of the data schema used
is Star Schema. The stage of making a star schema design begins with gathering business needs that must be met. These
business needs will then be translated into dimensional tables that are useful as primary data storage. These dimension tables
will form a fact table that is used as the basis for transaction data. The last stage is implementing star schema to the system
which is made a visualization of monitoring on data quality management tools.
The case study used in this study is analyzing data quality monitoring for a single column. The dimension is to calculate
the number of empty columns in the data that have been tested with data profiling.
Fig. 1. The flow of Method Implementation
IV. MONITORING PROCESS

The process of developing a data quality monitoring architecture according to the method described in Fig. 1. The steps
are designing graphics monitoring, designing star schema, and star schema implementation.
A. Designing Graphic Monitoring
At this stage, designing graphics monitoring. The monitoring chart has four web pages namely, show null home page,
show the null sum of Pass Record by DQ_RULE_ID, show the null sum of Pass Record by DQ_TIME_ID, show the null sum
of Pass Record by DQ_FIELD_ID - DQ_TIME_ID.
Figure on Fig. 2 is the design for show null monitoring page is opened. As we see in Fig. 2, it only shows the total of
shows null result record, and it uses just COUNT function to count the total record.
Fig. 2. Mockup for Homepage
At Fig. 3 when the user wants to count the number of PASS_RECORD based on DQ_RULE_ID. The user must drag
DQ_RULE_ID to the left side of the page as shown in Fig. 3 and the user must change to SUM choice in the dropdown at the
left, and change to PASS_RECORD at the dropdown list at the bottom of SUM dropdown list.
Fig. 3. Mockup Show Null Sum of Pass Record by DQ_RULE_ID
Besides counting the total of PASS_RECORD by DQ_RULE_ID, users can also count the total PASS_RECORD by
DQ_TIME_ID as seen in Fig. 4. The user must drag DQ_TIME_ID to the left side of the page as shown in Fig. 4, and the user
must change to SUM choice in the dropdown at the left, and change to PASS_RECORD at the dropdown list at the bottom of
SUM dropdown list.
Fig. 4. Mockup Show Null Sum of Pass Record by DQ_TIME_ID
At Fig. 5 represents the results of calculating the number of PASS_RECORD based on DQ_FIELD_ID and
DQ_TIME_ID. The user must drag DQ_FIELD_ID and DQ_TIME_ID to the left side of the page as shown in Fig. 4, and the
user must change to SUM choice in the dropdown at the left, and change to PASS_RECORD at the dropdown list at the
bottom of SUM dropdown list.

Fig. 5. Mockup Show Null Sum of Pass Record by DQ_FIELD_ID and DQ_TIME_ID
B. Designing Star Schema

Kimball et al. [ CITATION RKi97 \l 1033 ] said the simple application of making a star schema through several stages,
and there are business needs analysis, translation of business needs into dimension tables and real tables, application of
relations between dimension tables with real tables. Di bagian ini kami mengusulkan dimensional model berupa sebuah
skema bintang yang telah kami buat untuk digunakan pada pengembangan Data Quality Management Tools.
Based on the problems and expected solutions we propose a star scheme design as shown in Fig. 3. The proposed star
scheme stores information based on the business needs of the organization and technical requirements in developing data
quality management tools. The star scheme shown in Figure 3 has a real table and a dimensional table consisting of:
 DQ RESULTS FACT TABLE : is the fact table, the core of a star scheme that references all dimensional tables.
 DIM_DQ_TIME : is a dimension table intended to save the time of data quality testing.
 DIM_DQ_DIMENSIONS : is a table that stores information about the quality parameter of data used as
measurements.
 DIM_DQ_RULES : is a table that contains rules that are used to describe what data quality testing aims like, and also
as the location of test data located.
 DIM_DQ_FIELDS : is a table that contains information about the column being tested.
 DIM_DQ_ENTITIES : is a table that stores the values of the column under test.
Fig. 6. Proposed Star Schema
In Table I are column names from the DQ_RESULT fact table which has seven columns, there are DQ_RESULT_ID,
DQ_TIME_ID, DQ_DIMENSION_ID, DQ_RULE_ID, DQ_FIELD_ID, PASS_RECORD, FAIL_RECORD. In the
DQ_TIME_ID column it has dimensional references in the DIM_DQ_TIME table, DQ_DIMENSION_ID column has a
reference to the DIM_DQ_DIMENSIONS dimensional table, DQ_RULE_ID towards the dimensional table reference in the
DIM_DQ_RULES table, and column DQ_FIELD_ID towards dimensional references in the DIM_DQ_FIELDS table, while
the pass_record column contains the number of columns that passed the test, otherwise the FAIL_RECORD column shows
the number of columns that did not pass the test.
TABLE I. DQ_RESULTS FACT TABLE COLUMNS

Reference
Column
Column Name Dimensional
Description
Tables
DQ_RESULT_ID DQ RESULT ID -
Refers
DQ_TIME_ID DQ_TIME_ID DIM_DQ_TIME
dimensional table
Refers
DQ_DIMENSION_ DIM_DQ_DIMEN
DQ_DIMENSIONS
ID SIONS
dimensional table
Refers DQ_RULES
DQ_RULE_ID DIM_DQ_RULES
dimensional table
Refers
DQ_FIELD_ID DQ_FIELDS DIM_DQ_FIELDS
dimensional table
The number of
PASS_RECORD records that passed -
the test
The number of
FAIL_RECORD records that did not -
pass the test
Table II is a column representation of the DIM_DQ_TIME table, and there are seven columns namely DQ_TIME_ID as
this table's primary key, LOAD_TIMESTAMP, CALENDAR_DATE, DAY_NAME, MONTH_NAME, and YEAR_NUM.
TABLE II. DIM_DQ_TIME TABLE COLUMNS
Column Name Column Description

DQ_TIME_ID DQ TIME ID
LOAD_TIMESTAM Timestamp condition when data testing is
P taking place
CALENDAR_DATE The date when the data test took place
DAY_NAME The day during the data test took place
MONTH_NAME The month during the data test took place
YEAR_NUM The year during the data test took place
In Table III is a representation of the columns owned by the DIM_DQ_DIMENSIONS table. These columns are
DQ_DIMENSION_ID as table primary key, DQ_DIMENSION_NAME, and SHORT_NAME.
TABLE III. DIM_DQ_DIMENSIONS TABLE COLUMNS

DQ_DIMENSION_ID DQ DIMENSION ID
DQ_DIMENSION_NAM
The name of the data quality parameter
E
The short name of the data quality
SHORT_NAME
parameter
In Table IV is a set of column information in the table DIM_DQ_RULES. The columns consist of DQ_RULE_ID as the
primary key from the table, DQ_RULE_NAME, DQ_RULE_LOCATION, DQ_RULE_OWNER, and
DQ_RULE_DESCRIPTION.
TABLE IV. DIM_DQ_RULES TABLE COLUMNS

DQ_RULE_ID DQ RULE ID
DQ_RULE_NAME Name of data quality testing rules
DQ_RULE_LOCATION Location of data in data quality testing
DQ_RULE_OWNER Data owner in data quality testing
DQ_RULE_DESCRIPTION Description of data quality testing
Table V is a DIM_DQ_FIELDS table that has columns namely DQ_FIELD_ID as the primary key, DQ_ENTITY_ID
which has dimensional references with DQ_ENTITIES, and DQ FIELD_NAME.
TABLE V. DIM_DQ_FIELDS TABLE COLUMNS

Reference
Dimensional Tables
DQ_FIELD_ID DQ FIELD ID -
Refers
DQ_ENTITY_ID DQ_ENTITIES_ID DQ_ENTITIES
dimensional table
The name of the field
DQ_FIELD_NAME -
tested
Table VI represents the columns owned by the DIM_DQ_ENTITIES table consisting of DQ_ENTITY_ID as the primary
key in this table, STATUS, LOAD_TIMESTAMP, MODIFIED_DATE, and DQ_ENTITY_VALUE.
TABLE VI. DIM_DQ_ENTITIES TABLE COLUMNS

DQ_ENTITY_ID DQ ENTITY ID
STATUS Column status in data quality testing
LOAD_TIMESTAMP Time when data is stored in a table
MODIFIED_DATE Date of last modification
DQ_ENTITY_VALUE The value in DQ_ENTITY

C. Star Schema Implementation
The implementation of the star schema into the monitoring system starts with creating the database described as shown in
Fig. 7. This database will save all of the transaction processes in this system automatically.
Fig. 7. Database Implementation
After the database creation phase is complete, then create the Show Null Monitoring page as illustrated in Fig. 8.
Fig. 8. Show Null Monitoring Homepage
At Fig. 9 represents the Show Null Sum of Pass Record by DQ_RULE_ID page.
Fig. 9. Show Null Sum of Pass Record by DQ_RULE_ID
The process of calculating the number of PASS_RECORD based on time can be seen in the system as shown in Fig. 10.
Fig. 10. Show Null Sum of Pass Record by DQ_TIME_ID
At Fig. 11 processes for calculating PASS_RECORD data are based on two selection conditions DQ_FIELD_ID and
DQ_TIME_ID.
Fig. 11. Show Null Sum of Pass Record by DQ_FIELD_ID-DQ_TIME_ID

V. DISCUSSION
The design of a star schema is made starting with the stage of ensuring business needs can be translated into dimension
tables and fact tables. Next, implement the star schema into the MySQL database. The last stage in this study is visualization
using PivotTable. PivotTable is chosen because the visualization process is not massive and makes it easy for users to search
for data quality information needs based on the metrics they need.
When compared with other data quality management tools that discuss the monitoring process, the monitoring process in
this study is web-based so that it can be used in many places. In addition to the monitoring process that is real-time by
conducting a profiling test first, all data from the quality test data are stored on web monitoring. One of the objectives of using
monitoring is to ensure that the data is in accordance with business rules.
When compared with Talend Open Studio, a comparison will be seen as in Table VII. The results of comparison
monitoring between this study and Talend Open Studio found some similarities and advantages with the system developed.
Both have the same results in maintaining data quality in business rules with the assessment of Full Capabilities. Differences
occur when data quality visualization in this research produces more types of visualization compared to Talend Open Studio.
Both also have the same value for the criteria for user-friendliness. For the reporting phase, both can produce in the form of
tables and graphics formats. The platform used in this research is PHP and Java while the platform used by Talend Open
Studio is Perl and Java.
TABLE VII. THE COMPARISON OF THIS RESEARCH AND TALEND OPEN STUDIO
Kriteria This Research Talend Open Studio

Conform to business
rules that define data
Full Capabilities Full Capabilities
quality for the
organization
Design the
Visualization of Data Full Capabilities Limited Capabilities
Quality
User Friendliness More More
Table/Graphic Table/Graphic
Reporting
Format Format
Platform PHP/Java Perl/Java
VI. CONCLUSION AND ONGOING RESEARCH

In this paper, we propose a star schema that will use as a reference to develop monitoring on open source data quality
management tools and can be used in various organizations.
Star schema plays a vital role in developing data quality management monitoring. Using a star schema can translate
business needs that have previously been defined. Visualization of monitoring also plays a role so that the quality of the
monitored data looks good.
The research we are currently working on is the process of completing the development of data quality management at the
stage of data cleansing, data integration, and data quality monitoring in another data quality dimension parameters like data
cardinalities, data pattern, value distribution, value similarity, data completeness, data deduplication, and clustering.
REFERENCES
[1] K. Leonard, "The Role of Data in Business," Chron, 5 October 2018. [Online]. Available:
http://smallbusiness.chron.com/role-data-business-20405.html. [Accessed 13 February 2019].
[2] The Data Management Asociation, The DAMA Guide to The Data Management Body of Knowledge First Edition,
Bradley Beach: Technics Publications, LLC, 2009.
[3] Z. Panian, "Some Practical Experiences in Data Governance," World Academy of Science, Engineering and
Technology 62 2010, vol. 62, pp. 939-940, 2010.
[4] Herzog, Data Quality and Record Linkage Techniques, Springer, 2007.
[5] M. Rouse, "What is data quality?," Techtarget, May 2018. [Online]. Available:
http://searchdatamanagement.techtarget.com/definition/data-quality. [Accessed 20 February 2019].
[6] D. Apel, Datenqualität erfolgreich steuern: Praxislösungen für Business-Intelligence-Projekte, Heidelberg:
dpunkt.verlag, 2015.
[7] S. Judah, M. Y. Selvage and A. Jain, "Magic Quadrant for Data Quality Tools," GartnerResearch, 2016.
[8] L. Cai, "The Challenges of Data Quality and Data Quality Assessment in the Big Data Era," Data Science Journal,
pp. 1294-1297, 2015.
[9] R. Y. Wang, M. S. D., P. L. and L. L. Y., Manage Your Information as Product: The Keystone to Quality
Information, MIT Sloan School of Management, 1997.
[10] Z. Abedjan, L. Golab and F. Nauman, "Data Profiling," in 2016 IEEE 32nd International Conference on Data
Engineering (ICDE), 2016.
[11] Dataflux Corporation, "Data Profilling : The diagnosis for better enterprise information," pp. 2-10, 2016.
[12] Experian, "The 2014 Digital Marketer," Experian, California, 2014.
[13] J. G. Jennifer, W. O. Walter, E. K. Timothy, P. M. Jeffrey and P. A. William, "Data Quality Assurance, Monitoring,
and Reporting," Controlled Clinical Trials, vol. 16, pp. 105-135, 1995.
[14] T. L., M. A., S. I and D. R. G, "CMS data quality monitoring: systems and," Journal of Physics: Conference Series
219 (2010) 072020, vol. IX, pp. 1-9, 2010.
[15] A. Blinn, M. A. Cohen, M. Lorton and G. J. Stein, "Database Schema Independence". United States Patent
US5974418A, 1996.
[16] D. Kinshansing and M. Sutan Saghar, "Study Of The ANSI/SPARC Architecture," International Journal of Modern
Trends in Engineering and, pp. 22-30, 2014.
[17] E. Ramez and B. N. Shamkant, Fundamentals of Database Systems:Sixth Edition, Boston: Addison-Wesley, 2010.
[18] J. D. C., Relational Database: Selected Writings First Edition Edition, Boston: Addison-Wesley, 1998.
[19] E. F. Codd, "Derivability, Redundancy, and Consistency of Relations Stored in Large Data Banks," IBM, 1969.
[20] S. Frank, "A Gentle Introduction to Relational and Object Oriented Databases," ORL Technical Report, Cambridge,
1995.
[21] N. Dedić and C. Stanier, "An Evaluation of the Challenges of Multilingualism in Data Warehouse Development," in
18th International Conference on Enterprise Information Systems - ICEIS 2016, Rome, 2016.
[22] T. Connolly and C. Begg, Database Systems - A Practical Approach to Design, Implementation and Management,
Pearson , 2015.
[23] D. L. Moody and M. A. Kortink, "From Enterprise Models to Dimensional Models: A Methodology for Data
Warehouse and Data Mart Design," in International Workshop on Design and Management of Data Warehouses
(DMDW'2000), Stockholm, 2000.

Paper Icic Reza 2019

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Paper Icic Reza 2019

Uploaded by

Copyright:

Available Formats

Star Schema Implementation For Monitoring in

Data Quality Management Tool (A Case Study at A

XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE

Fig. 1. The flow of Method Implementation

IV. MONITORING PROCESS

Fig. 2. Mockup for Homepage

Fig. 4. Mockup Show Null Sum of Pass Record by DQ_TIME_ID

bottom of SUM dropdown list.

B. Designing Star Schema

TABLE I. DQ_RESULTS FACT TABLE COLUMNS

TABLE II. DIM_DQ_TIME TABLE COLUMNS

Column Name Column Description

DAY_NAME The day during the data test took place

MONTH_NAME The month during the data test took place

YEAR_NUM The year during the data test took place

TABLE III. DIM_DQ_DIMENSIONS TABLE COLUMNS

TABLE IV. DIM_DQ_RULES TABLE COLUMNS

Column Name Column Description

DQ_RULE_NAME Name of data quality testing rules

DQ_RULE_LOCATION Location of data in data quality testing

DQ_RULE_OWNER Data owner in data quality testing

DQ_RULE_DESCRIPTION Description of data quality testing

TABLE V. DIM_DQ_FIELDS TABLE COLUMNS

TABLE VI. DIM_DQ_ENTITIES TABLE COLUMNS

Column Name Column Description

STATUS Column status in data quality testing

LOAD_TIMESTAMP Time when data is stored in a table

MODIFIED_DATE Date of last modification

DQ_ENTITY_VALUE The value in DQ_ENTITY

Fig. 7. Database Implementation

Fig. 8. Show Null Monitoring Homepage

Fig. 9. Show Null Sum of Pass Record by DQ_RULE_ID

Fig. 10. Show Null Sum of Pass Record by DQ_TIME_ID

Fig. 11. Show Null Sum of Pass Record by DQ_FIELD_ID-DQ_TIME_ID

Kriteria This Research Talend Open Studio

VI. CONCLUSION AND ONGOING RESEARCH

You might also like