Professional Documents
Culture Documents
Operational Database Data Warehouse
Operational Database Data Warehouse
Operational Database Data Warehouse
Operational systems are designed to support high- Data warehousing systems are typically designed to
volume transaction processing. support high-volume analytical processing (i.e.,
OLAP).
Operational systems are usually concerned with current Data warehousing systems are usually concerned
data. with historical data.
Data within operational systems are mainly updated Non-volatile, new data may be added regularly.
regularly according to need. Once Added rarely changed.
It is designed for real-time business dealing and It is designed for analysis of business measures by
processes. subject area, categories, and attributes.
It is optimized for a simple set of transactions, generally It is optimized for extent loads and high, complex,
adding or retrieving a single row at a time per table. unpredictable queries that access many rows per
table.
It is optimized for validation of incoming information Loaded with consistent, valid information, requires
during transactions, uses validation data tables. no real-time validation.
It supports thousands of concurrent clients. It supports a few concurrent clients relative to OLTP.
Operational systems are widely process-oriented. Data warehousing systems are widely subject-
oriented
Operational systems are usually optimized to perform Data warehousing systems are usually optimized to
fast inserts and updates of associatively small volumes perform fast retrievals of relatively high volumes of
of data. data.
Relational databases are created for on-line Data Warehouse designed for on-line Analytical
transactional Processing (OLTP) Processing (OLAP)
Feature OLTP OLAP
Data contents OLTP system manages OLAP system manages a large amount
current data that too detailed of historical data, provides facilitates for
and are used for decision summarization and aggregation, and
making. stores and manages data at different
levels of granularity. This information
makes the data more comfortable to use
in informed decision making.
Database OLTP system usually uses an OLAP system typically uses either a star
design entity-relationship (ER) data or snowflake model and subject-
model and application- oriented database design.
oriented database design.
Volume of data Not very large Because of their large volume, OLAP
data are stored on multiple storage
media.
Access patterns The access patterns of an Accesses to OLAP systems are mostly
OLTP system subsist mainly read-only methods because of these
of short, atomic transactions. data warehouses stores historical data.
Such a system requires
concurrency control and
recovery techniques.
Insert and Short and fast inserts and Periodic long-running batch jobs refresh
Updates updates proposed by end- the data.
users.
DATA MART
A data mart (as noted above) is a focused version of a data warehouse that contains a smaller subset of
data important to and needed by a single team or a select group of users within an organization. A
data mart is built from an existing data warehouse (or other data sources) through a complex
procedure that involves multiple technologies and tools to design and construct a physical database,
populate it with data, and set up intricate access and management protocols.
Benefits
Cost-efficiency
Simplified data access
Quicker access to insights
Simpler data maintenance
Easier and faster implementation
Document essential requirements
Identify the data sources
Determine the data subset
Design the logical layout
DATA TABLES
Data tables display information in a grid-like format of rows and columns. They
organize information in a way that’s easy to scan so that users can look for patterns
and develop insights from data.
DML resembles simple English language and enhances efficient user interaction with the
system. The functional capability of DML is organized in manipulation commands like
SELECT, UPDATE, INSERT INTO and DELETE FROM, as described below:
SELECT: This command is used to retrieve rows from a table. The syntax is
SELECT [column name(s)] from [table name] where [conditions]. SELECT is the
most widely used DML command in SQL.
UPDATE: This command modifies data of one or more records. An update command
syntax is UPDATE [table name] SET [column name = value] where [condition].
INSERT: This command adds one or more records to a database table. The insert
command syntax is INSERT INTO [table name] [column(s)] VALUES [value(s)].
DELETE: This command removes one or more records from a table according to
specified conditions. Delete command syntax is DELETE FROM [table name] where
[condition].
Meta Data
Metadata is data about the data or documentation about the information which is
required by the users. In data warehousing, metadata is one of the essential aspects.
Metadata includes the following:
DRILL DOWN
Drill down is a capability that takes the user from a more general view of the data to a more
specific one at the click of a mouse. For example, a drill down report that shows sales
revenue by state can allow the user to select a state, click on it and see sales revenue by
county or city within that state. It is called “drill down” because it is a feature that allows the
user to go deeper into more specific layers of the data or information being analyzed.
Gain instant knowledge of different depths of the data – A drill down report
gives the user a deeper insight of the data by letting him see what makes up
the figures he’s analyzing. For example, in mere seconds, data drill-down
answers questions such as: of my national sales figure, which states are
performing better? Which states are underperforming? And within each state,
which territories are driving revenue?
DRILL THROUGH
Instead of taking the user to a more granular level of the data, drill through takes him to a
report that is relevant to the data being analyzed, also at the click of a mouse. For example,
a tabular report that shows sales revenue by state can allow the user to click on it and reveal
an analysis grid of the same data, or a heat map representing the data in visual form. It is
called “drill through” because it is a feature that allows the user to pass from one report to
another while still analyzing the same set of data.
See data from different points of view – Drill through reports allows users to
analyze the same data through different formats, analyze it with different
features, and even display it through different visualization methods. This
greatly enhances the users’ understanding of the data and of the reasons
behind the figures.
DISADVANTAGES
Monetary cost
assumption of relevance transfer of power
unanticipated effects false belief in objectivity
status reduction
information overload
GDSS
In a group decision support system (GDSS) electronic meeting, each
participant is provided with a computer. The computers are
connected to each other, to the facilitator’s computer and to the file
server. A projection screen is available at the front of the room. The
facilitator and the participants can both project digital text and
images onto this screen.
A group decision support system (GDSS) meeting comprises
different phases, such as idea generation, discussion, voting, vote
counting and so on. The facilitator manages and controls the
execution of these phases. The use of various software tools in the
meeting is also controlled by the facilitator.
Components
Hardware: It includes electronic hardware like the
computer, equipment used for networking, electronic
display boards and audiovisual equipment. It also
includes the conference facility, including the physical set
up – the room, the tables, and the chairs – laid out in
such a manner that they can support group
discussion and teamwork.
Software Tools: It includes various tools and techniques,
such as electronic questionnaires, electronic
brainstorming tools, idea organizers, tools for setting
priority, policy formation tool, etc. The use of these
software tools in a group meeting helps the group
decision-makers to plan, organize ideas, gather
information, establish priorities, take decisions and
document the meeting proceedings. As a result,
meetings become more productive.
People: It compromises the members participating in the
meeting, a trained facilitator who helps with the
proceedings of the meeting, and an expert staff to
support the hardware and software. The GDSS
components together provide a favorable environment
for carrying out group meetings.
Features
Ease of Use
Better Decision Making
Emphasis on Semi-structured and Unstructured Decisions
Specific and General Support
Supports all Phases of the Decision Making
Supports Positive Group Behavior
Tools
Electronic Questionnaire
Electronic Brainstorming Tools
Idea Organizer
Tools for Setting Priority
Policy Formation Tool
What is Groupware
Definition: Groupware refers to software that allows multiple users
work together on one project while sitting in locally and remotely
with each other at the real time, so it is also known as
“Collaboration Software“.
With the help of groupware, multiple users can exchange emails,
documents, shared database access, organized online meeting in
between various users and they able to see each other as well as
view information to other users, collective writing, calendaring, task
management, scheduling, and more activities.
Example: Microsoft Word, Excel, Lotus WordPro, Microsoft
Explorer, Net Meeting
EXPERT SYSTEM
An expert system is a computer program that uses artificial
intelligence (AI) technologies to simulate the judgment and behavior of a
human or an organization that has expert knowledge and experience in a
particular field.
Advantages
1. Increased availability and reliability: Expertise can be accessed on any computer
hardware and the system always completes responses on time.
2. Multiple expertise: Several expert systems can be run simultaneously to solve a
problem. and gain a higher level of expertise than a human expert.
3. Explanation: Expert systems always describe of how the problem was solved.
4. Fast response: The expert systems are fast and able to solve a problem in real-
time.
5. Reduced cost: The cost of expertise for each user is significantly reduced.
Disadvantages
1. Expert systems have superficial knowledge, and a simple task can potentially
become computationally expensive.
2. Expert systems require knowledge engineers to input the data, data acquisition
is very hard.
3. The expert system may choose the most inappropriate method for solving a
particular problem.
4. Problems of ethics in the use of any form of AI are very relevant at present.
5. It is a closed world with specific knowledge, in which there is no deep perception
of concepts and their interrelationships until an expert provides them.
DATA SOURCE
A data source is the location where data that is being used originates from.
A data source may be the initial location where data is born or where physical information
is first digitized.
Here’s an example of a data source in action. Imagine a fashion brand selling products
online. To display whether an item is out of stock, the website gets information from an
inventory database. In this case, the inventory tables are a data source, accessed by the
web application which serves the website to customers.
Databases remain the most common data sources, as the primary stores for data in
ubiquitous relational database management systems (RDBMS). In this context, an
important concept is the Data Source Name (DSN). The DSN is defined within destination
databases or applications as a pointer to the actual data, whether it exists locally or is
found on a remote server (and whether in a single physical location or virtualized.) The
DSN is not necessarily the same as the relevant database name or file name, rather it is in
an address or label used to easily reach the data at its source.
EXTRACTION/PROPAGATION-
Data extraction is the process of obtaining data from a database or SaaS platform so that it
can be replicated to a destination — such as a data warehouse — designed to support online
analytical processing (OLAP).
Suppose an organization wants to monitor its reputation in the marketplace. It may have data
from many sources, including online reviews, social media mentions, and online
transactions. An ETL tool can extract data from these sources and load it into a data
warehouse where it can be analyzed and mined for insights into brand perception.
Data Propagation is the distribution of data from one or more source data
warehouses to one or more local access databases, according to propagation rules. Data
warehouses need to manage big bulks of data every day. A data warehouse may start with a
few data, and starts to grow day by day by constant sharing and receiving from various data
sources.
DATA REFINEMENT –
Data refinement means ensuring the data put into a data analytics platform is relevant, homogenized
and categorized so the users can get meaningful results and pinpoint discrepancies. The data
refinement process is a key part of establishing a data-driven company and maintaining good habits.
Data Mining methods may be classified by the function they perform or according
to the class of application they can be used in
Predictive modeling
o Classification
Stored data is used to locate data in predetermined groups Data
Mining tools have to infer a model from the database.This requires
one or more classes
discriminative
generative
o Regression,curve fitting
Segmentation (Clustering)
Data items are grouped together according to logical relationships or
consumer preferences.It involves processes of creating a partition so that all
the members of each set of the partition are similar according to some metric
o distance based methods
o model-based clustering
o partition-based methods
Dependency Modeling
o Probabilistic Graphical Modeling(Bayesian networks)
Summarization
o association rules Data can be mined to identify associations
Change and deviation detection
44M
887
C++ vs Java
2. Clustering:
Clustering is a division of information into groups of connected objects. Describing
the data by a few clusters mainly loses certain confine details, but accomplishes
improvement. It models data by its clusters. Data modeling puts clustering from a
historical point of view rooted in statistics, mathematics, and numerical analysis.
From a machine learning point of view, clusters relate to hidden patterns, the search
for clusters is unsupervised learning, and the subsequent framework represents a
data concept. From a practical point of view, clustering plays an extraordinary job in
data mining applications. For example, scientific data exploration, text mining,
information retrieval, spatial database applications, CRM, Web analysis,
computational biology, medical diagnostics, and much more.
In other words, we can say that Clustering analysis is a data mining technique to
identify similar data. This technique helps to recognize the differences and similarities
between the data. Clustering is very similar to the classification, but it involves
grouping chunks of data together based on their similarities.
3. Regression:
Regression analysis is the data mining process is used to identify and analyze the
relationship between variables because of the presence of the other factor. It is used
to define the probability of the specific variable. Regression, primarily a form of
planning and modeling. For example, we might use it to project certain costs,
depending on other factors such as availability, consumer demand, and competition.
Primarily it gives the exact relationship between two or more variables in the given
data set.
4. Association Rules:
This data mining technique helps to discover a link between two or more items. It
finds a hidden pattern in the data set.
Association rules are if-then statements that support to show the probability of
interactions between data items within large data sets in different types of databases.
Association rule mining has several applications and is commonly used to help sales
correlations in data or medical data sets.
The way the algorithm works is that you have various data, For example, a list of
grocery items that you have been buying for the last six months. It calculates a
percentage of items being purchased together.
o Lift:
This measurement technique measures the accuracy of the confidence over
how often item B is purchased.
(Confidence) / (item B)/ (Entire dataset)
o Support:
This measurement technique measures how often multiple items are
purchased and compared it to the overall dataset.
(Item A + Item B) / (Entire dataset)
o Confidence:
This measurement technique measures how often item B is purchased when
item A is purchased as well.
(Item A + Item B)/ (Item A)
5. Outer detection:
This type of data mining technique relates to the observation of data items in the
data set, which do not match an expected pattern or expected behavior. This
technique may be used in various domains like intrusion, detection, fraud detection,
etc. It is also known as Outlier Analysis or Outilier mining. The outlier is a data point
that diverges too much from the rest of the dataset. The majority of the real-world
datasets have an outlier. Outlier detection plays a significant role in the data mining
field. Outlier detection is valuable in numerous fields like network interruption
identification, credit or debit card fraud detection, detecting outlying in wireless
sensor network data, etc.
6. Sequential Patterns:
The sequential pattern is a data mining technique specialized for evaluating
sequential data to discover sequential patterns. It comprises of finding interesting
subsequences
in a set of sequences, where the stake of a sequence can be measured in terms of
different criteria like length, occurrence frequency, etc.
In other words, this technique of data mining helps to discover or recognize similar
patterns in transaction data over some time.
7. Prediction:
Prediction used a combination of other data mining techniques such as trends,
clustering, classification, etc. It analyzes past events or instances in the right
sequence to predict a future event.
DATA VISUALIZATION-
Data visualization is the representation of data through use of common graphics, such as charts, plots,
infographics, and even animations. These visual displays of information communicate complex data
relationships and data-driven insights in a way that is easy to understand.
Data visualization is commonly used to spur idea generation across teams. They are frequently
leveraged during brainstorming or Design Thinking sessions at the start of a project by supporting the
collection of different perspectives and highlighting the common concerns of the collective.
Transport
The transport and travel industry uses business analytics in many areas.
First, governments employ business analytics to monitor and control
traffic. It helps them in optimising route planning and develop intelligent
transport systems to manage the general traffic effectively. As
governments start implementing more technologies, the demand for
business analytics in traffic management will increase further.
On the other hand, travel companies use business analytics to create
customer profiles and find ways to optimise their user experiences.
For example, a travel agency might use business analytics to figure out
why a specific tourism package fails to generate sales through surveys and
customer feedback and optimise the same.
MANUFACTURING
Business intelligence gives manufacturers the ability to take
information and data that might once have been siloed across
the factory floor, and assimilate it into one convenient access
source. There, individual departments can access the
information they need to make decisions pertinent to their area.
At the same time, supervisors and management can check to
ensure the overall systems are performing at peak productivity.
While manufacturers might pay a great deal of attention to
maintaining equipment and training personnel, they frequently
fail
to apply the same dedication to their business intelligence
processes. This lack of commitment to amassing and assessing
data can lead to production delays and cost overruns.
Pharmaceutical Industry
The pharmaceutical industry is an essential and fast-growing industry that uses
business intelligence (BI) solutions to foster effective decision-making, minimize
operational costs, enhance sales management, and much more. Nowadays, BI plays
a vital role in helping pharmaceutical companies maintain a competitive edge as data
becomes more prominent. Analyzing data with BI helps companies identify
marketing strategies and sales opportunities, explore market aspects, predict
research operations around clinical trials, and much more. This empowers business
leaders to make appropriate, data-driven decisions.
And with the rise of IoT and widespread use of wireless sensor networks, telecom companies
have boatloads of data to handle, presenting obvious challenges but also big opportunities.
Yet, most business intelligence or data analytics platforms aren’t agile enough to make this
data accessible and useful. At least until ThoughtSpot’s business intelligence for
telecommunications platform came along. Here are four ways ThoughtSpot helps telecom
providers put their data to use.
BI telecom platforms like ThoughtSpot could tell a company the most common types of
service interruption, the areas that experience interruption most often or even the devices
most likely to encounter issues.
The more digital content that’s created and more devices we have to consume it from
ultimately means a greater reliance on quality service. But no matter how evolved technology
becomes, customer complaints will always occur, which is why issues need to be resolved
ahead of time to reduce customer wait times and unsatisfactory support.
ThoughtSpot’s business intelligence platform helps telecom providers use their wealth of data
to inform their decisions. Whether you’re forecasting how many support reps you’ll need for
the upcoming holiday season or determining how many Android versus iOS technical support
employees to staff, ThoughtSpot clarifies the picture. When extensive data is presented so
cleanly, customer trends also become easier to grasp, such as comparing services outages on
a particular day by customer segment or devices. On a more granular level, customer care
reps can up their service by searching for the most common customer complaints, finding
stats on the complaints that customers waited on the phone longest about or even identifying
support questions by email that took reps longest to respond.
Having a robust subscriber base is great, but without a platform to better understand
customers, opportunities for cross/upselling and more personalized customer service are lost.
A search and AI-powered telecom business intelligence tool can provide anything from
granular subscription-by-subscription insights to long-term trends by region or subscriber
growth rates by zip code while pointing out data anomalies and drawing causal relationships
that are moving business forward.
Of course, all these benefits contribute to any business’ central goal of reducing customer churn!
IoT continues to grow and accelerate. Between 2016–2017 alone, IoT devices increased 31
percent to reach 8.4 billion. By 2020, it’s estimated 30 billion objects will comprise the IoT.
One way telecom companies are making up for diminishing data revenue is by forming IoT
partnerships, like in connected cars.
As other opportunities emerge, like connected vehicle fleets, workforce training and media
deals around mobile streaming, there’ll be more IoT subscriptions for providers to manage
and more data to collect. For telecom companies to properly capitalize on IoT subscriptions,
the vast amounts of data they collect must be accessible, timely and easy to digest. Since
ThoughtSpot works off in-memory calculations, scales to as many users who have questions
to ask and relays findings through easy-to-digest visualizations, telecom companies have
massive unrealized revenue sitting there for the taking.