Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Data Management

Data management is an administrative process that includes acquiring, validating, storing,


protecting, and processing required data to ensure the accessibility, reliability, and timeliness of
the data for its users. Organizations and enterprises are making use of Big Data more than ever
before to inform business decisions and gain deep insights into customer behavior, trends, and
opportunities for creating extraordinary customer experiences.

To make sense of the vast quantities of data that enterprises are gathering, analyzing, and storing
today, companies turn to data management solutions and platforms. Data management solutions
make processing, validation, and other essential functions simpler and less time-intensive.

Leading data management platforms allow enterprises to leverage Big Data from all data
sources, in real-time, to allow for more effective engagement with customers, and for increased
customer lifetime value (CLV). Data management software is essential, as we are creating and
consuming data at unprecedented rates. Top data management platforms give enterprises and
organizations a 360-degree view of their customers and the complete visibility needed to gain
deep, critical insights into consumer behavior that give brands a competitive edge.

Benefits of Data Management and Data Management Platforms

Managing your data is the first step toward handling the large volume of data, both structured
and unstructured, that floods businesses daily. It is only through data management best practices
that organizations are able to harness the power of their data and gain the insights they need to
make the data useful.

In fact, data management via leading data management platforms enables organizations and
enterprises to use data analytics in beneficial ways, such as:

 Personalizing the customer experience


 Adding value to customer interactions
 Identifying the root causes of marketing failures and business issues in real- time
 Reaping the revenues associated with data-driven marketing
 Improving customer engagement
 Increasing customer loyalty
Need of Data Management

In the digital age, data is king. This is why it is seen as one of the most important assets of an
organisation; it is the foundation of information and the basis on which people make decisions.
Hence it would follow that if the data is accurate, complete, organized and consistent, it will
contribute to the growth of the organisation. And in case of the opposite, it would become a very
big liability; bad organisation of data can lead to making harmful decisions because of
incomplete information. In addition, the amount of data connected to an organisation today is on
an unprecedented scale and impossible to process manually; this is why it is important to invest
in an effective data management system. These are some of the additional, and undeniable,
benefits of the same.

1. Increases productivity: If data can be accessed easily, especially in large organisations,


your company will be more organized and productive. It reduces the time that people
spend looking for information and instead ensures that they can do their job efficiently.
Employees will also be able to understand and communicate information to others.
Furthermore, it makes it easy to access past correspondence and prevent
miscommunication due to messages lost in transit.
2. Smooth operations: A seamless operating system is every business’ dream and data
management can make that a reality. It is one of the determining factors in ensuring the
Success of an organisation; if one takes too long to respond to their customers or to the
changing trends around them, they run the risk of falling behind. A risk that one cannot
afford. A good data management system will sure that you respond to the world
accordingly and stay ahead of the competition.
3. Reduce security risk: It is the first time in history that so much personal information is
available to those that can access it. When you store people’s credit card information,
personal address, phone numbers, photos, etc. it is of paramount importance that this data
is protected by the best possible security. If your data is not managed properly, it can fall
into the wrong hands. Data theft will also have severe implications on the growth of your
company; nobody wants to leave their details in the hands of people that do not know
how to protect it.
4. Cost effective: If you have a good system in place, you will spend less money trying to
fix problems that shouldn’t have occurred in the first place. It also prevents spending
time-and money- duplicating information that already exists.
5. Minimal chance of data loss: A good data management system will reduce the chances
of losing important company information. It also ensures that your data is backed up and
in case of a sudden glitch or system failure, any data that is lost can be retrieved easily,
limiting the repercussions of the same.
6. Better decision making: When everything is in its place, and everyone knows where to
look for it, the quality of your decisions improve drastically. By nature, people have
different ways of processing information, but a centralised system ensures a framework
to plan, organise and delegate. Additionally, a good system will ensure good feedback,
which in turn will lead to necessary updates to the process that will only benefit your
company in the long run.
Data Management Challenges

While some companies are good at collecting data, they are not managing it well enough to make
sense of it. Simply collecting data is not enough; enterprises and organizations need to
understand from the start that data management and data analytics only will be successful when
they first put some thought into how they will gain value from their raw data. They can then
move beyond raw data collection with efficient systems for processing, storing, and validating
data, as well as effective analysis strategies.

Another challenge of data management occurs when companies categorize data and organize it
without first considering the answers they hope to glean from the data. Each step of data
collection and management must lead toward acquiring the right data and analyzing it in order to
get the actionable intelligence necessary for making truly data-driven business decisions.

Data Management Best Practices

The best way to manage data, and eventually get the insights needed to make data-driven
decisions, is to begin with a business question and acquire the data that is needed to answer that
question. Companies must collect vast amounts of information from various sources and then
utilize best practices while going through the process of storing and managing the data, cleaning
and mining the data, and then analyzing and visualizing the data in order to inform their business
decisions.

It’s important to keep in mind that data management best practices result in better analytics. By
correctly managing and preparing the data for analytics, companies optimize their Big Data. A
few data management best practices organizations and enterprises should strive to achieve
include:

 Simplify access to traditional and emerging data


 Scrub data to infuse quality into existing business processes
 Shape data using flexible manipulation techniques

It is with the help of data management platforms that organizations have the ability to gather,
sort, and house their information and then repackage it in visualized ways that are useful to
marketers. Top performing data management platforms are capable of managing all of the data
from all data sources in a central location, giving marketers and executives the most accurate
business and customer information available.
DATABASE MANAGEMENT SYSTEM

Introduction

For carrying out the working of the management information systems in a very efficient and a
proper way, the data acts as a very essential and a critical part for the transaction processing. In
the beginning, when the era of the computer applications literally started, the data used to be
maintained on the basis of the use of a particular data for some particular application.

In such a scenario, every user system used to have its own master files and also the transaction
files, but a very important point to be kept in mind here is that these transaction files and the
master files were processed separately.

One of the major observations included the presence of the data redundancy. This data
redundancy is generally caused because of the fact which says that the data which is mainly
needed by the systems is common and as a result of this, repetition creeps in the data stored in
the various user systems. And this data redundancy further leads to the complexities in the data
management.

Data redundancy also leads to the lack of the integrity and the in consistency of the data
available in the various user files.
The name „The Database Management System‟ comes from the database approach that emerged
out of the need, indeed urgency, to eliminate the data management problems. Database is
actually very pivotal to The Database Management System.

The database also uses a very specific structuring of the data but the actual meaning of the word
database can be defined as “A mechanical or an automated, formally defined, centrally
controlled collection of the data in an organization

Data Independence

A database system normally contains a lot of data in addition to users’ data. For example, it
stores data about data, known as metadata, to locate and retrieve data easily. It is rather difficult
to modify or update a set of metadata once it is stored in the database. But as a DBMS expands,
it needs to change over time to satisfy the requirements of the users. If the entire data is
dependent, it would become a tedious and highly complex job.
If a database system is not multi-layered, then it becomes difficult to make any changes in the
database system. Database systems are designed in multi-layers as we learnt earlier.
A database system normally contains a lot of data in addition to users’ data. For example, it
stores data about data, known as metadata, to locate and retrieve data easily. It is rather difficult
to modify or update a set of metadata once it is stored in the database. But as a DBMS expands,
it needs to change over time to satisfy the requirements of the users. If the entire data is
dependent, it would become a tedious and highly complex job.
Metadata itself follows a layered architecture, so that when we change data at one layer, it does
not affect the data at another level. This data is independent but mapped to each other.

Logical Data Independence

Logical data is data about database, that is, it stores information about how data is managed
inside. For example, a table (relation) stored in the database and all its constraints, applied on
that relation.
Logical data independence is a kind of mechanism, which liberalizes itself from actual data
stored on the disk. If we do some changes on table format, it should not change the data residing
on the disk.

Physical Data Independence

All the schemas are logical, and the actual data is stored in bit format on the disk. Physical data
independence is the power to change the physical data without impacting the schema or logical
data.
For example, in case we want to change or upgrade the storage system itself − suppose we want
to replace hard-disks with SSD − it should not have any impact on the logical data or schemas.

DATA REDUNDANCY
In computer main memory, auxiliary storage and computer buses, data redundancy is the
existence of data that is additional to the actual data and permits correction of errors in stored or
transmitted data. The additional data can simply be a complete copy of the actual data, or only
select pieces of data that allow detection of errors and reconstruction of lost or damaged data up
to a certain level.
Data redundancy occurs when the same piece of data is stored in two or more separate places.
Suppose you create a database to store sales records, and in the records for each sale, you enter
the customer address. Yet, you have multiple sales to the same customer so the same address is
entered multiple times. The address that is repeatedly entered is redundant data.

How does data redundancy occur

Data redundancy can be designed; for example, suppose you want to back up your company’s
data nightly. This creates a redundancy. Data redundancy can also occur by mistake. For
example, the database designer who created a system with a new record for each sale may not
have realized that his design caused the same address to be entered repeatedly. You may also end
up with redundant data when you store the same information in multiple systems. For instance,
suppose you store the same basic employee information in Human Resources records and in
records maintained for your local site office.

Why data redundancy can be a problem

When data redundancy is unplanned, it can be a problem. For example, if you have a Customers
table which includes the address as one of the data fields, and the John Doe family conducts
business with you and all live at the same address, you will have multiple entries of the same
address in your database. If the John Doe family moves, you'll need to update the address for
each family member, which can be time-consuming, and introduce the possibility of entering a
mistake or a typo for one of the addresses. In addition, each entry of the address that is
unnecessary takes up additional space that becomes costly over time. Lastly, the more
redundancy, the greater difficulty in maintaining the data. These problems — inconsistent data,
wasted space, and effort to maintain data — can become a major headache for companies with
lots of data.

How data redundancy is resolved for a database

It’s not possible or practical to have zero data redundancy, and many database administrators
consider it acceptable to have a certain amount of data redundancy if there is a central master
field. Master data is a single source of common business data used across multiple systems or
applications. It is usually non-transactional data, such as a list of customers and their contact
information. The master data ensures that if a piece of data changes, you update the data only
one time, which allows you to prevent data inconsistencies.

In addition, the process of normalization is commonly used to remove redundancies. When you
normalize the data, you organize the columns (attributes) and tables (relations) of a database to
ensure that their dependencies are correctly enforced by database integrity constraints. The set of
rules for normalizing data is called a normal form, and a database is considered "normalized" if it
meets the third normal form, meaning that it is free of insert, delete, and update anomalies.
DATA ADMINISTRATION

Data administration is the process by which data is monitored, maintained and managed by a
data administrator and/or an organization. Data administration allows an organization to control
its data assets, as well as their processing and interactions with different applications and
business processes. Data administration ensures that the entire life cycle of data use and
processing is on par with the enterprise’s objective.

Data administration may also be called data resource management.

Data Administrator (DA):


• "Person in the organization who controls the data of the database refers data administrator."
• DA determines what data to be stored in database based on requirement of the organization.
• DA works on such as requirements gathering, analysis, and design phases.
• DA does not to be a technical person, any kind of knowledge about database technology can be
more beneficiary
• DA is some senior level person in the organization. in short, DA is a business focused person
but should understand about the database technology.

Database Administrator (DBA):


• "Person in the organization who controls the design and the use of the database refers database
administrator."
• DBA provides necessary technical support for implementing a database.
• DBA works on such as design, development, testing, and operational phases.
• DBA is a technical person having knowledge of database technology.
• DBA does not need to be a business person. in short, DBA is a technically focused person but
should understand about the business to administrator the database effectively.

Data consistency
Data consistency refers to whether the same data kept at different places do or do not match.
Data consistency is the process of keeping information uniform as it moves across a network and
between various applications on a computer. There are typically three types of data consistency:
point in time consistency, transaction consistency, and application consistency. Ensuring that a
computer network has all three elements of data consistency covered is the best way to ensure
that data is not lost or corrupted as it travels throughout the system. In the absence of data
consistency, there are no guarantees that any piece of information on the system is uniform
across the breadth of the computer network.

Data consistency helps ensure that information on a crashing computer can be restored to its pre-
crash state.
Point in time consistency deals with ensuring that all elements of a system are uniform at a
specific moment in time. This prevents loss of data during system crashes, improper shutdowns,
and other problems on the network. It functions by referencing pieces of data on the system via
timestamps and other markers of consistency, allowing the system to be restored to a specific
moment in time with each piece of data in its original place. Without point in time consistency,
there would be no guarantee that all information on a crashing computer could be restored to its
pre-crash state.

Transaction consistency is consistency of a piece of data across a working transaction within


the computer. For example, a banking program might originally request an end user's starting
account balance. From that point on, the entire program relies on the original balance figure
remaining consistent in the program's memory. If the original balance is $50,000 US Dollars
($50,000 USD) and a problem on the system alters that to $75,000 USD, the computer is without
transaction consistency. Without transaction consistency, nothing entered into a program remains
reliable.

Application consistency is nothing more than transaction consistency between programs. For
example, if the banking program communicates with a tax program on the computer, application
consistency means that the information moving between the programs will remain in its original
state. Without application consistency, the same problems arise here as do under flawed
transaction consistency: there will be no way to tell whether a value entered into the system
remains correct over time.

The primary advantage to ensuring data consistency is maintaining the integrity of the
information stored on the computer or across the network. Without all three types of consistency
working together, one cannot tell whether the data stored on the computer today will be the same
following a crash, installation, or other major system event. That is why maintaining consistency
is one of the primary goals for all data-based computer programs.

Database management system

• A database management system is a collection of inter-related data and a set of programs to


manipulate those data.

• Database management system stores data in such a way that it becomes easier to retrieve,
manipulate, and produce information.

• Data manipulation consist various operation like store data, modify data, remove data, retrieve
that data.

• So, DBMS = Database + Set of programs

• It also provide security or safety against system crashes.


What is database environment?

• A database system provides basic functionalities like storage, manipulation and use of data.

• It have four major components of the database system which forms the database system
environment.
1. Data
2. Hardware
3. Software
4. Users

1. Data
• It is most important component of database system.
• Data means known facts which can be recorded and have implicit meaning.

2. Hardware
• All physical module of a computer are referred as hardware.
• Such as memory, mouse, hard disk, printer, etc.

3. Software
• Software provides interconnection between users and database stored in physical devices.
• The operating system manages all hardware of the computer.
• DBMS software and operating system form the sofware component here.

4. Users
• Any person who interacts with a database in any form is consider as database user
• categories of the database user: 1. Database administrator

2. Database designers
3. Application programmers
4. End users

Various Definitions related to database

What is Data?
• Data means known facts, which can be recorded and implicit meaning. data is also a collection
of facts and figures.

What is Information?
• Information means processed or organized data.which can be drived from data and facts.

What is Data-Item (field)?


• It is a character or group of characters that has a specific meaning.
• For Example, cid, c name from customer table

What is record?
• It is a collection of logically related fields.
• And we also say that record consists of values for each field.
What is file?
• It is a collection related records. which arranged in a specific sequence.

What is metadata?
• Set of data that describes and gives information about another data.
• in other words, data about data is called metadate.

Data Warehouse
The term "Data Warehouse" was first coined by Bill Inmon in 1990. According to Inmon, a data
warehouse is a subject oriented, integrated, time-variant, and non-volatile collection of data.
This data helps analysts to take informed decisions in an organization.
An operational database undergoes frequent changes on a daily basis on account of the
transactions that take place. Suppose a business executive wants to analyze previous feedback
on any data such as a product, a supplier, or any consumer data, then the executive will have no
data available to analyze because the previous data has been updated due to transactions.
A data warehouses provides us generalized and consolidated data in multidimensional view.
Along with generalized and consolidated view of data, a data warehouses also provides us
Online Analytical Processing (OLAP) tools. These tools help us in interactive and effective
analysis of data in a multidimensional space. This analysis results in data generalization and
data mining.
Data mining functions such as association, clustering, classification, prediction can be
integrated with OLAP operations to enhance the interactive mining of knowledge at multiple
level of abstraction. That's why data warehouse has now become an important platform for data
analysis and online analytical processing.

Understanding a Data Warehouse

 A data warehouse is a database, which is kept separate from the organization's


operational database.
 There is no frequent updating done in a data warehouse.
 It possesses consolidated historical data, which helps the organization to analyze its
business.
 A data warehouse helps executives to organize, understand, and use their data to take
strategic decisions.
 Data warehouse systems help in the integration of diversity of application systems.
 A data warehouse system helps in consolidated historical data analysis.
Data Mining

The process of digging through data to discover hidden connections and predict future trends has
a long history. Sometimes referred to as "knowledge discovery in databases," the term "data
mining" wasn’t coined until the 1990s. But its foundation comprises three intertwined scientific
disciplines: statistics (the numeric study of data relationships), artificial intelligence (human-like
intelligence displayed by software and/or machines) and machine learning (algorithms that can
learn from data to make predictions). What was old is new again, as data mining technology
keeps evolving to keep pace with the limitless potential of big data and affordable computing
power.

Over the last decade, advances in processing power and speed have enabled us to move beyond
manual, tedious and time-consuming practices to quick, easy and automated data analysis. The
more complex the data sets collected, the more potential there is to uncover relevant insights.
Retailers, banks, manufacturers, telecommunications providers and insurers, among others, are
using data mining to discover relationships among everything from price optimization,
promotions and demographics to how the economy, risk, competition and social media are
affecting their business models, revenues, operations and customer relationships.

Data mining process

The data mining process involves a number of steps from data collection to visualization to
extract valuable information from large data sets. As mentioned above, data mining techniques
are used to generate descriptions and predictions about a target data set. Data scientists describe
data through their observations of patterns, associations, and correlations. They also classify and
cluster data through classification and regression methods, and identify outliers for use cases,
like spam detection.

Data mining usually consists of four main steps: setting objectives, data gathering and
preparation, applying data mining algorithms, and evaluating results.

1. Set the business objectives: This can be the hardest part of the data mining process, and
many organizations spend too little time on this important step. Data scientists and business
stakeholders need to work together to define the business problem, which helps inform the data
questions and parameters for a given project. Analysts may also need to do additional research to
understand the business context appropriately.

2. Data preparation: Once the scope of the problem is defined, it is easier for data scientists to
identify which set of data will help answer the pertinent questions to the business. Once they
collect the relevant data, the data will be cleaned, removing any noise, such as duplicates,
missing values, and outliers. Depending on the dataset, an additional step may be taken to reduce
the number of dimensions as too many features can slow down any subsequent computation.
Data scientists will look to retain the most important predictors to ensure optimal accuracy
within any models.
3. Model building and pattern mining: Depending on the type of analysis, data scientists may
investigate any interesting data relationships, such as sequential patterns, association rules, or
correlations. While high frequency patterns have broader applications, sometimes the deviations
in the data can be more interesting, highlighting areas of potential fraud.

Deep learning algorithms may also be applied to classify or cluster a data set depending on the
available data. If the input data is labelled (i.e. supervised learning), a classification model may
be used to categorize data, or alternatively, a regression may be applied to predict the likelihood
of a particular assignment. If the dataset isn’t labelled (i.e. unsupervised learning), the individual
data points in the training set are compared with one another to discover underlying similarities,
clustering them based on those characteristics.

4. Evaluation of results and implementation of knowledge: Once the data is aggregated, the
results need to be evaluated and interpreted. When finalizing results, they should be valid, novel,
useful, and understandable. When this criteria is met, organizations can use this knowledge to
implement new strategies, achieving their intended objectives.

Data Mining Techniques

Data mining is the process of looking at large banks of information to generate new information.
Intuitively, you might think that data “mining” refers to the extraction of new data, but this isn’t
the case; instead, data mining is about extrapolating patterns and new knowledge from the data
you’ve already collected.
Relying on techniques and technologies from the intersection of database management, statistics,
and machine learning, specialists in data mining have dedicated their careers to better
understanding how to process and draw conclusions from vast amounts of information. But what
are the techniques they use to make this happen?
Data Mining Techniques
Data mining is highly effective, so long as it draws upon one or more of these techniques:
1. Tracking patterns. One of the most basic techniques in data mining is learning to recognize
patterns in your data sets. This is usually a recognition of some aberration in your data happening
at regular intervals, or an ebb and flow of a certain variable over time. For example, you might
see that your sales of a certain product seem to spike just before the holidays, or notice that
warmer weather drives more people to your website.
2. Classification. Classification is a more complex data mining technique that forces you to
collect various attributes together into discernable categories, which you can then use to draw
further conclusions, or serve some function. For example, if you’re evaluating data on individual
customers’ financial backgrounds and purchase histories, you might be able to classify them as
“low,” “medium,” or “high” credit risks. You could then use these classifications to learn even
more about those customers.
3. Association. Association is related to tracking patterns, but is more specific to dependently
linked variables. In this case, you’ll look for specific events or attributes that are highly
correlated with another event or attribute; for example, you might notice that when your
customers buy a specific item, they also often buy a second, related item. This is usually what’s
used to populate “people also bought” sections of online stores.
4. Outlier detection. In many cases, simply recognizing the overarching pattern can’t give you a
clear understanding of your data set. You also need to be able to identify anomalies, or outliers in
your data. For example, if your purchasers are almost exclusively male, but during one strange
week in July, there’s a huge spike in female purchasers, you’ll want to investigate the spike and
see what drove it, so you can either replicate it or better understand your audience in the process.
5. Clustering. Clustering is very similar to classification, but involves grouping chunks of data
together based on their similarities. For example, you might choose to cluster different
demographics of your audience into different packets based on how much disposable income
they have, or how often they tend to shop at your store.
6. Regression. Regression, used primarily as a form of planning and modeling, is used to
identify the likelihood of a certain variable, given the presence of other variables. For example,
you could use it to project a certain price, based on other factors like availability, consumer
demand, and competition. More specifically, regression’s main focus is to help you uncover the
exact relationship between two (or more) variables in a given data set.
7. Prediction. Prediction is one of the most valuable data mining techniques, since it’s used to
project the types of data you’ll see in the future. In many cases, just recognizing and
understanding historical trends is enough to chart a somewhat accurate prediction of what will
happen in the future. For example, you might review consumers’ credit histories and past
purchases to predict whether they’ll be a credit risk in the future.

Business intelligence

Business intelligence (BI) combines business analytics, data mining, data visualization, data
tools and infrastructure, and best practices to help organizations to make more data-driven
decisions. In practice, you know you’ve got modern business intelligence when you have a
comprehensive view of your organization’s data and use that data to drive change, eliminate
inefficiencies, and quickly adapt to market or supply changes.

It’s important to note that this is a very modern definition of BI—and BI has had a strangled
history as a buzzword. Traditional Business Intelligence, capital letters and all, originally
emerged in the 1960s as a system of sharing information across organizations. It further
developed in the 1980s alongside computer models for decision-making and turning data into
insights before becoming specific offering from BI teams with IT-reliant service solutions.
Modern BI solutions prioritize flexible self-service analysis, governed data on trusted platforms,
empowered business users, and speed to insight.

Much more than a specific “thing,” business intelligence is rather an umbrella term that covers
the processes and methods of collecting, storing, and analyzing data from business operations or
activities to optimize performance. All of these things come together to create a comprehensive
view of a business to help people make better, actionable decisions.

Over the past few years, business intelligence has evolved to include more processes and
activities to help improve performance. These processes include:
 Data mining: Using databases, statistics and machine learning to uncover trends in large
datasets.
 Reporting: Sharing data analysis to stakeholders so they can draw conclusions and make
decisions.
 Performance metrics and benchmarking: Comparing current performance data to
historical data to track performance against goals, typically using customized dashboards.
 Descriptive analytics: Using preliminary data analysis to find out what happened.
 Querying: Asking the data specific questions, BI pulling the answers from the datasets.
 Statistical analysis: Taking the results from descriptive analytics and further exploring
the data using statistics such as how this trend happened and why.
 Data visualization: Turning data analysis into visual representations such as charts,
graphs, and histograms to more easily consume data.
 Visual analysis: Exploring data through visual storytelling to communicate insights on
the fly and stay in the flow of analysis.
 Data preparation: Compiling multiple data sources, identifying the dimensions and
measurements, preparing it for data analysis.

You might also like