Professional Documents
Culture Documents
Unit-3 MIS
Unit-3 MIS
To make sense of the vast quantities of data that enterprises are gathering, analyzing, and storing
today, companies turn to data management solutions and platforms. Data management solutions
make processing, validation, and other essential functions simpler and less time-intensive.
Leading data management platforms allow enterprises to leverage Big Data from all data
sources, in real-time, to allow for more effective engagement with customers, and for increased
customer lifetime value (CLV). Data management software is essential, as we are creating and
consuming data at unprecedented rates. Top data management platforms give enterprises and
organizations a 360-degree view of their customers and the complete visibility needed to gain
deep, critical insights into consumer behavior that give brands a competitive edge.
Managing your data is the first step toward handling the large volume of data, both structured
and unstructured, that floods businesses daily. It is only through data management best practices
that organizations are able to harness the power of their data and gain the insights they need to
make the data useful.
In fact, data management via leading data management platforms enables organizations and
enterprises to use data analytics in beneficial ways, such as:
In the digital age, data is king. This is why it is seen as one of the most important assets of an
organisation; it is the foundation of information and the basis on which people make decisions.
Hence it would follow that if the data is accurate, complete, organized and consistent, it will
contribute to the growth of the organisation. And in case of the opposite, it would become a very
big liability; bad organisation of data can lead to making harmful decisions because of
incomplete information. In addition, the amount of data connected to an organisation today is on
an unprecedented scale and impossible to process manually; this is why it is important to invest
in an effective data management system. These are some of the additional, and undeniable,
benefits of the same.
While some companies are good at collecting data, they are not managing it well enough to make
sense of it. Simply collecting data is not enough; enterprises and organizations need to
understand from the start that data management and data analytics only will be successful when
they first put some thought into how they will gain value from their raw data. They can then
move beyond raw data collection with efficient systems for processing, storing, and validating
data, as well as effective analysis strategies.
Another challenge of data management occurs when companies categorize data and organize it
without first considering the answers they hope to glean from the data. Each step of data
collection and management must lead toward acquiring the right data and analyzing it in order to
get the actionable intelligence necessary for making truly data-driven business decisions.
The best way to manage data, and eventually get the insights needed to make data-driven
decisions, is to begin with a business question and acquire the data that is needed to answer that
question. Companies must collect vast amounts of information from various sources and then
utilize best practices while going through the process of storing and managing the data, cleaning
and mining the data, and then analyzing and visualizing the data in order to inform their business
decisions.
It’s important to keep in mind that data management best practices result in better analytics. By
correctly managing and preparing the data for analytics, companies optimize their Big Data. A
few data management best practices organizations and enterprises should strive to achieve
include:
It is with the help of data management platforms that organizations have the ability to gather,
sort, and house their information and then repackage it in visualized ways that are useful to
marketers. Top performing data management platforms are capable of managing all of the data
from all data sources in a central location, giving marketers and executives the most accurate
business and customer information available.
DATABASE MANAGEMENT SYSTEM
Introduction
For carrying out the working of the management information systems in a very efficient and a
proper way, the data acts as a very essential and a critical part for the transaction processing. In
the beginning, when the era of the computer applications literally started, the data used to be
maintained on the basis of the use of a particular data for some particular application.
In such a scenario, every user system used to have its own master files and also the transaction
files, but a very important point to be kept in mind here is that these transaction files and the
master files were processed separately.
One of the major observations included the presence of the data redundancy. This data
redundancy is generally caused because of the fact which says that the data which is mainly
needed by the systems is common and as a result of this, repetition creeps in the data stored in
the various user systems. And this data redundancy further leads to the complexities in the data
management.
Data redundancy also leads to the lack of the integrity and the in consistency of the data
available in the various user files.
The name „The Database Management System‟ comes from the database approach that emerged
out of the need, indeed urgency, to eliminate the data management problems. Database is
actually very pivotal to The Database Management System.
The database also uses a very specific structuring of the data but the actual meaning of the word
database can be defined as “A mechanical or an automated, formally defined, centrally
controlled collection of the data in an organization
Data Independence
A database system normally contains a lot of data in addition to users’ data. For example, it
stores data about data, known as metadata, to locate and retrieve data easily. It is rather difficult
to modify or update a set of metadata once it is stored in the database. But as a DBMS expands,
it needs to change over time to satisfy the requirements of the users. If the entire data is
dependent, it would become a tedious and highly complex job.
If a database system is not multi-layered, then it becomes difficult to make any changes in the
database system. Database systems are designed in multi-layers as we learnt earlier.
A database system normally contains a lot of data in addition to users’ data. For example, it
stores data about data, known as metadata, to locate and retrieve data easily. It is rather difficult
to modify or update a set of metadata once it is stored in the database. But as a DBMS expands,
it needs to change over time to satisfy the requirements of the users. If the entire data is
dependent, it would become a tedious and highly complex job.
Metadata itself follows a layered architecture, so that when we change data at one layer, it does
not affect the data at another level. This data is independent but mapped to each other.
Logical data is data about database, that is, it stores information about how data is managed
inside. For example, a table (relation) stored in the database and all its constraints, applied on
that relation.
Logical data independence is a kind of mechanism, which liberalizes itself from actual data
stored on the disk. If we do some changes on table format, it should not change the data residing
on the disk.
All the schemas are logical, and the actual data is stored in bit format on the disk. Physical data
independence is the power to change the physical data without impacting the schema or logical
data.
For example, in case we want to change or upgrade the storage system itself − suppose we want
to replace hard-disks with SSD − it should not have any impact on the logical data or schemas.
DATA REDUNDANCY
In computer main memory, auxiliary storage and computer buses, data redundancy is the
existence of data that is additional to the actual data and permits correction of errors in stored or
transmitted data. The additional data can simply be a complete copy of the actual data, or only
select pieces of data that allow detection of errors and reconstruction of lost or damaged data up
to a certain level.
Data redundancy occurs when the same piece of data is stored in two or more separate places.
Suppose you create a database to store sales records, and in the records for each sale, you enter
the customer address. Yet, you have multiple sales to the same customer so the same address is
entered multiple times. The address that is repeatedly entered is redundant data.
Data redundancy can be designed; for example, suppose you want to back up your company’s
data nightly. This creates a redundancy. Data redundancy can also occur by mistake. For
example, the database designer who created a system with a new record for each sale may not
have realized that his design caused the same address to be entered repeatedly. You may also end
up with redundant data when you store the same information in multiple systems. For instance,
suppose you store the same basic employee information in Human Resources records and in
records maintained for your local site office.
When data redundancy is unplanned, it can be a problem. For example, if you have a Customers
table which includes the address as one of the data fields, and the John Doe family conducts
business with you and all live at the same address, you will have multiple entries of the same
address in your database. If the John Doe family moves, you'll need to update the address for
each family member, which can be time-consuming, and introduce the possibility of entering a
mistake or a typo for one of the addresses. In addition, each entry of the address that is
unnecessary takes up additional space that becomes costly over time. Lastly, the more
redundancy, the greater difficulty in maintaining the data. These problems — inconsistent data,
wasted space, and effort to maintain data — can become a major headache for companies with
lots of data.
It’s not possible or practical to have zero data redundancy, and many database administrators
consider it acceptable to have a certain amount of data redundancy if there is a central master
field. Master data is a single source of common business data used across multiple systems or
applications. It is usually non-transactional data, such as a list of customers and their contact
information. The master data ensures that if a piece of data changes, you update the data only
one time, which allows you to prevent data inconsistencies.
In addition, the process of normalization is commonly used to remove redundancies. When you
normalize the data, you organize the columns (attributes) and tables (relations) of a database to
ensure that their dependencies are correctly enforced by database integrity constraints. The set of
rules for normalizing data is called a normal form, and a database is considered "normalized" if it
meets the third normal form, meaning that it is free of insert, delete, and update anomalies.
DATA ADMINISTRATION
Data administration is the process by which data is monitored, maintained and managed by a
data administrator and/or an organization. Data administration allows an organization to control
its data assets, as well as their processing and interactions with different applications and
business processes. Data administration ensures that the entire life cycle of data use and
processing is on par with the enterprise’s objective.
Data consistency
Data consistency refers to whether the same data kept at different places do or do not match.
Data consistency is the process of keeping information uniform as it moves across a network and
between various applications on a computer. There are typically three types of data consistency:
point in time consistency, transaction consistency, and application consistency. Ensuring that a
computer network has all three elements of data consistency covered is the best way to ensure
that data is not lost or corrupted as it travels throughout the system. In the absence of data
consistency, there are no guarantees that any piece of information on the system is uniform
across the breadth of the computer network.
Data consistency helps ensure that information on a crashing computer can be restored to its pre-
crash state.
Point in time consistency deals with ensuring that all elements of a system are uniform at a
specific moment in time. This prevents loss of data during system crashes, improper shutdowns,
and other problems on the network. It functions by referencing pieces of data on the system via
timestamps and other markers of consistency, allowing the system to be restored to a specific
moment in time with each piece of data in its original place. Without point in time consistency,
there would be no guarantee that all information on a crashing computer could be restored to its
pre-crash state.
Application consistency is nothing more than transaction consistency between programs. For
example, if the banking program communicates with a tax program on the computer, application
consistency means that the information moving between the programs will remain in its original
state. Without application consistency, the same problems arise here as do under flawed
transaction consistency: there will be no way to tell whether a value entered into the system
remains correct over time.
The primary advantage to ensuring data consistency is maintaining the integrity of the
information stored on the computer or across the network. Without all three types of consistency
working together, one cannot tell whether the data stored on the computer today will be the same
following a crash, installation, or other major system event. That is why maintaining consistency
is one of the primary goals for all data-based computer programs.
• Database management system stores data in such a way that it becomes easier to retrieve,
manipulate, and produce information.
• Data manipulation consist various operation like store data, modify data, remove data, retrieve
that data.
• A database system provides basic functionalities like storage, manipulation and use of data.
• It have four major components of the database system which forms the database system
environment.
1. Data
2. Hardware
3. Software
4. Users
1. Data
• It is most important component of database system.
• Data means known facts which can be recorded and have implicit meaning.
2. Hardware
• All physical module of a computer are referred as hardware.
• Such as memory, mouse, hard disk, printer, etc.
3. Software
• Software provides interconnection between users and database stored in physical devices.
• The operating system manages all hardware of the computer.
• DBMS software and operating system form the sofware component here.
4. Users
• Any person who interacts with a database in any form is consider as database user
• categories of the database user: 1. Database administrator
2. Database designers
3. Application programmers
4. End users
What is Data?
• Data means known facts, which can be recorded and implicit meaning. data is also a collection
of facts and figures.
What is Information?
• Information means processed or organized data.which can be drived from data and facts.
What is record?
• It is a collection of logically related fields.
• And we also say that record consists of values for each field.
What is file?
• It is a collection related records. which arranged in a specific sequence.
What is metadata?
• Set of data that describes and gives information about another data.
• in other words, data about data is called metadate.
Data Warehouse
The term "Data Warehouse" was first coined by Bill Inmon in 1990. According to Inmon, a data
warehouse is a subject oriented, integrated, time-variant, and non-volatile collection of data.
This data helps analysts to take informed decisions in an organization.
An operational database undergoes frequent changes on a daily basis on account of the
transactions that take place. Suppose a business executive wants to analyze previous feedback
on any data such as a product, a supplier, or any consumer data, then the executive will have no
data available to analyze because the previous data has been updated due to transactions.
A data warehouses provides us generalized and consolidated data in multidimensional view.
Along with generalized and consolidated view of data, a data warehouses also provides us
Online Analytical Processing (OLAP) tools. These tools help us in interactive and effective
analysis of data in a multidimensional space. This analysis results in data generalization and
data mining.
Data mining functions such as association, clustering, classification, prediction can be
integrated with OLAP operations to enhance the interactive mining of knowledge at multiple
level of abstraction. That's why data warehouse has now become an important platform for data
analysis and online analytical processing.
The process of digging through data to discover hidden connections and predict future trends has
a long history. Sometimes referred to as "knowledge discovery in databases," the term "data
mining" wasn’t coined until the 1990s. But its foundation comprises three intertwined scientific
disciplines: statistics (the numeric study of data relationships), artificial intelligence (human-like
intelligence displayed by software and/or machines) and machine learning (algorithms that can
learn from data to make predictions). What was old is new again, as data mining technology
keeps evolving to keep pace with the limitless potential of big data and affordable computing
power.
Over the last decade, advances in processing power and speed have enabled us to move beyond
manual, tedious and time-consuming practices to quick, easy and automated data analysis. The
more complex the data sets collected, the more potential there is to uncover relevant insights.
Retailers, banks, manufacturers, telecommunications providers and insurers, among others, are
using data mining to discover relationships among everything from price optimization,
promotions and demographics to how the economy, risk, competition and social media are
affecting their business models, revenues, operations and customer relationships.
The data mining process involves a number of steps from data collection to visualization to
extract valuable information from large data sets. As mentioned above, data mining techniques
are used to generate descriptions and predictions about a target data set. Data scientists describe
data through their observations of patterns, associations, and correlations. They also classify and
cluster data through classification and regression methods, and identify outliers for use cases,
like spam detection.
Data mining usually consists of four main steps: setting objectives, data gathering and
preparation, applying data mining algorithms, and evaluating results.
1. Set the business objectives: This can be the hardest part of the data mining process, and
many organizations spend too little time on this important step. Data scientists and business
stakeholders need to work together to define the business problem, which helps inform the data
questions and parameters for a given project. Analysts may also need to do additional research to
understand the business context appropriately.
2. Data preparation: Once the scope of the problem is defined, it is easier for data scientists to
identify which set of data will help answer the pertinent questions to the business. Once they
collect the relevant data, the data will be cleaned, removing any noise, such as duplicates,
missing values, and outliers. Depending on the dataset, an additional step may be taken to reduce
the number of dimensions as too many features can slow down any subsequent computation.
Data scientists will look to retain the most important predictors to ensure optimal accuracy
within any models.
3. Model building and pattern mining: Depending on the type of analysis, data scientists may
investigate any interesting data relationships, such as sequential patterns, association rules, or
correlations. While high frequency patterns have broader applications, sometimes the deviations
in the data can be more interesting, highlighting areas of potential fraud.
Deep learning algorithms may also be applied to classify or cluster a data set depending on the
available data. If the input data is labelled (i.e. supervised learning), a classification model may
be used to categorize data, or alternatively, a regression may be applied to predict the likelihood
of a particular assignment. If the dataset isn’t labelled (i.e. unsupervised learning), the individual
data points in the training set are compared with one another to discover underlying similarities,
clustering them based on those characteristics.
4. Evaluation of results and implementation of knowledge: Once the data is aggregated, the
results need to be evaluated and interpreted. When finalizing results, they should be valid, novel,
useful, and understandable. When this criteria is met, organizations can use this knowledge to
implement new strategies, achieving their intended objectives.
Data mining is the process of looking at large banks of information to generate new information.
Intuitively, you might think that data “mining” refers to the extraction of new data, but this isn’t
the case; instead, data mining is about extrapolating patterns and new knowledge from the data
you’ve already collected.
Relying on techniques and technologies from the intersection of database management, statistics,
and machine learning, specialists in data mining have dedicated their careers to better
understanding how to process and draw conclusions from vast amounts of information. But what
are the techniques they use to make this happen?
Data Mining Techniques
Data mining is highly effective, so long as it draws upon one or more of these techniques:
1. Tracking patterns. One of the most basic techniques in data mining is learning to recognize
patterns in your data sets. This is usually a recognition of some aberration in your data happening
at regular intervals, or an ebb and flow of a certain variable over time. For example, you might
see that your sales of a certain product seem to spike just before the holidays, or notice that
warmer weather drives more people to your website.
2. Classification. Classification is a more complex data mining technique that forces you to
collect various attributes together into discernable categories, which you can then use to draw
further conclusions, or serve some function. For example, if you’re evaluating data on individual
customers’ financial backgrounds and purchase histories, you might be able to classify them as
“low,” “medium,” or “high” credit risks. You could then use these classifications to learn even
more about those customers.
3. Association. Association is related to tracking patterns, but is more specific to dependently
linked variables. In this case, you’ll look for specific events or attributes that are highly
correlated with another event or attribute; for example, you might notice that when your
customers buy a specific item, they also often buy a second, related item. This is usually what’s
used to populate “people also bought” sections of online stores.
4. Outlier detection. In many cases, simply recognizing the overarching pattern can’t give you a
clear understanding of your data set. You also need to be able to identify anomalies, or outliers in
your data. For example, if your purchasers are almost exclusively male, but during one strange
week in July, there’s a huge spike in female purchasers, you’ll want to investigate the spike and
see what drove it, so you can either replicate it or better understand your audience in the process.
5. Clustering. Clustering is very similar to classification, but involves grouping chunks of data
together based on their similarities. For example, you might choose to cluster different
demographics of your audience into different packets based on how much disposable income
they have, or how often they tend to shop at your store.
6. Regression. Regression, used primarily as a form of planning and modeling, is used to
identify the likelihood of a certain variable, given the presence of other variables. For example,
you could use it to project a certain price, based on other factors like availability, consumer
demand, and competition. More specifically, regression’s main focus is to help you uncover the
exact relationship between two (or more) variables in a given data set.
7. Prediction. Prediction is one of the most valuable data mining techniques, since it’s used to
project the types of data you’ll see in the future. In many cases, just recognizing and
understanding historical trends is enough to chart a somewhat accurate prediction of what will
happen in the future. For example, you might review consumers’ credit histories and past
purchases to predict whether they’ll be a credit risk in the future.
Business intelligence
Business intelligence (BI) combines business analytics, data mining, data visualization, data
tools and infrastructure, and best practices to help organizations to make more data-driven
decisions. In practice, you know you’ve got modern business intelligence when you have a
comprehensive view of your organization’s data and use that data to drive change, eliminate
inefficiencies, and quickly adapt to market or supply changes.
It’s important to note that this is a very modern definition of BI—and BI has had a strangled
history as a buzzword. Traditional Business Intelligence, capital letters and all, originally
emerged in the 1960s as a system of sharing information across organizations. It further
developed in the 1980s alongside computer models for decision-making and turning data into
insights before becoming specific offering from BI teams with IT-reliant service solutions.
Modern BI solutions prioritize flexible self-service analysis, governed data on trusted platforms,
empowered business users, and speed to insight.
Much more than a specific “thing,” business intelligence is rather an umbrella term that covers
the processes and methods of collecting, storing, and analyzing data from business operations or
activities to optimize performance. All of these things come together to create a comprehensive
view of a business to help people make better, actionable decisions.
Over the past few years, business intelligence has evolved to include more processes and
activities to help improve performance. These processes include:
Data mining: Using databases, statistics and machine learning to uncover trends in large
datasets.
Reporting: Sharing data analysis to stakeholders so they can draw conclusions and make
decisions.
Performance metrics and benchmarking: Comparing current performance data to
historical data to track performance against goals, typically using customized dashboards.
Descriptive analytics: Using preliminary data analysis to find out what happened.
Querying: Asking the data specific questions, BI pulling the answers from the datasets.
Statistical analysis: Taking the results from descriptive analytics and further exploring
the data using statistics such as how this trend happened and why.
Data visualization: Turning data analysis into visual representations such as charts,
graphs, and histograms to more easily consume data.
Visual analysis: Exploring data through visual storytelling to communicate insights on
the fly and stay in the flow of analysis.
Data preparation: Compiling multiple data sources, identifying the dimensions and
measurements, preparing it for data analysis.