Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 27

The Structure of ERP

The essence of ERP is the fundamental approach which takes a holistic view of the subject. The traditional application system, which the organizations generally employ, treats each transaction separately. They are built around the strong boundaries of specific functions that a specific application is meant to cater. ERP considers them to be the part of the interlinked processes that make up the total business and financial impact. Almost all the typical application systems are nothing but the data manipulation tools. They store data, process them, and present them in the appropriate form whenever requested by the user. In this process, the only problem is that there is no link between the application systems being used by different departments. An ERP system also does the same thing, but generated as a result of diverse transactions, but they are not confined to any departmental or functional boundaries. These are rather integrated for the speedy and accurate results required by multiple users, for multiple purposes, for multiple sites, and at multiple times. Thus ERP solution implies that it be: Flexible: An ERP system should be flexible to respond to the changing needs of an enterprise. The client server technology enables ERP to run across various database backend through Open Data Base Connectivity (ODBC). Modular: The ERP system has to have modular application architecture. This means that various functionalities are logically clubbed into different business processes and structured into a module which can be interfaced or detached whenever required without affecting the other modules. It should support multiple hardware platforms for the companies having heterogeneous collection of systems. It must support some third party add-ons also. Comprehensive: It should be able to support variety of organizational functions and must be suitable for a wide range of business organizations. Beyond the company: It should not be confined to the organizational boundaries; rather, it should support the on-line connectivity to the other business entities of the organization. This feature is the recent development and such ERP situation is referred to as Web-enabled ERP. Belong to the best business practices: It must have a collection of the best business processes applicable worldwide. To make use of the above advantage, ERP Architecture must be designed using the advanced information technologies and environments. Thus, ERP is typically implemented through a client-server environment. This technology divides the applications fundamentally into two or more components, called Server and Clients. The client portion uses the functions of the server. Servers are centralized while clients tend to be spread out in multiple locations.

Evolution of ERP
ERP is an outcome of 40 years of trial and error. It has evolved as a strategic tool because of continuous improvement in the available techniques to manage business and the fast growth of information technology.

Prior to 1960s, business had to rely on the traditional ways of inventory management to ensure smooth functioning of the organization. These theories are called classical inventory management of scientific inventory control methods. The most popularly known amongst them is EOQ (Economic Order Quantity). In this method, each item in the stock is analyzed for its ordering cost and the inventory carrying cost. A trade off is established on a phased out expected demand of one year, and this way the most economic ordering quantity can be decided. This technique in principle is a deterministic way of managing inventory. Along with EOQ, we find various inventory models such as fixed order quantity, periodic order method, optional replenishment method, etc., which were in practice earlier. These theories were very popular in pre-MRP era. In 1960s, a new technique of Material Requirements Planning, popularly known as MRP, was evolved. This was a proactive manner of inventory management. This technique fundamentally explodes the end product demand obtained from the Master Production Schedule (MPS) for a specified product structure (which is taken from Bill of Material) into a detailed schedule of purchase orders or production orders, taking into account the inventory on hand. MRP is a simple logic but the magnitude of data involved in a realistic situation makes it computationally cumbersome. If undertaken manually, the entire process is highly time-consuming. MRP successfully demonstrated its effectiveness in reduction of inventory, production, and delivery lead times by improving coordination and avoiding delays, thus making commitments more realistic. MRP proved to be a very good technique for managing inventory, but it did not take into account other resources of an organization. In 1970s, this gave birth to a modified MRP logic, popularly known as closed loop MRP. In this technique, the capacity of the organization to produce a particular product is also taken into account by incorporating a module called capacity requirements planning (CRP). In 1980s, the need was felt to integrate the financial resource with the manufacturing activities. From this evolved an integrated manufacturing management system called Manufacturng Resource Planning (MRP II). Transition from MRPII to ERP happened during 1980-90. The basic MRP II system design was suffering from a few inherent drawbacks such as limited focus to manufacturing activities, assumption of the mass or repetitive production set ups, and poor budgetary and costing controls. The shortcomings of MRP II and the need to integrate new techniques led to the development of a total integrated solution called ERP, which attempts to integrate the transactions of the organization to produce the best possible plan. Today we see further development in the ERP concept and evolution web-based ERP.

Evolution Of ERP ERP (Enterprise Resource Planning) is the evolution of Manufacturing Requirements Planning (MRP) II. From business perspective, ERP has expanded from coordination of manufacturing processes to the integration of enterprise-wide backend processes. From technological aspect, ERP has evolved from legacy implementation to more flexible tiered client-server architecture

Table1.1 The Evolution Of ERP From 1960s To 1990s

Timeline 1960s

1970s

1980s

2000s

System Description Inventory Inventory Management and control is the combination of Management & information technology and business processes of maintaining the Control appropriate level of stock in a warehouse. The activities of inventory management include identifying inventory requirements, setting targets, providing replenishment techniques and options, monitoring item usages, reconciling the inventory balances, and reporting inventory status. Material Materials Requirement Planning (MRP) utilizes software Requirement applications for scheduling production processes. MRP generates Planning (MRP) schedules for the operations and raw material purchases based on the production requirements of finished goods, the structure of the production system, the current inventories levels and the lot sizing procedure for each operation. Manufacturing Manufacturing Requirements Planning or MRP utilizes software Requirements applications for coordinating manufacturing processes, from Planning (MRP product planning, parts purchasing, inventory control to product II) distribution. Enterprise Enterprise Resource Planning or ERP uses multi-module Resource application software for improving the performance of the internal Planning (ERP) business processes. ERP systems often integrates business activities across functional departments, from product planning, parts purchasing, inventory control, product distribution, fulfillment, to order tracking. ERP software systems may include application modules for supporting marketing, finance, accounting and human resources

Business process re-engineering is the analysis and design of workflows and processes within an organization. According to Davenport (1990) a business process is a set of logically related tasks performed to achieve a defined business outcome. Re-engineering is the basis for many recent developments in management. The cross-functional team, for example, has become popular because of the desire to re-engineer separate functional tasks into complete cross-functional processes. Also, many recent management information systems developments aim to integrate a wide number of business functions. Enterprise resource planning, supply chain management, knowledge management systems, groupware and collaborative systems, Human Resource Management Systems and customer relationship management.

Definition Different definitions can be found. This section contains the definition provided in notable publications in the field:

"... the fundamental rethinking and radical redesign of business processes to achieve dramatic improvements in critical modern measures of performance, such as cost, quality, service, and speed." "encompasses the envisioning of new work strategies, the actual process design activity, and the implementation of the change in all its complex technological, human, and organizational dimensions." Business process re-engineering is also known as business process redesign, business transformation, or business process change management.

Reengineering guidance and relationship of Mission and Work Processes to Information Technology.Business Process Re-engineering (BPR) is basically the fundamental re-thinking and radical re-design, made to an organization's existing resources. It is more than just business improvising. Re-engineering recognizes that an organization's business processes are usually fragmented into subprocesses and tasks that are carried out by several specialized functional areas within the organization. Often, no one is responsible for the overall performance of the entire process. Re-engineering maintains that optimizing the performance of subprocesses can result in some benefits, but cannot yield dramatic improvements if the process itself is fundamentally inefficient and outmoded. For that reason, re-engineering focuses on re-designing the process as a whole in order to achieve the greatest possible benefits to the organization and their customers. This drive for realizing dramatic improvements by fundamentally re-thinking how the organization's work should be done distinguishes re-engineering from process improvement efforts that focus on functional or incremental improvement.

The role of information technology Information technology (IT) has historically played an important role in the reengineering concept [9]. It is considered by some as a major enabler for new forms of working and collaborating within an organization and across organizational borders. Early BPR literature identified several so called disruptive technologies that were supposed to challenge traditional wisdom about how work should be performed.

Shared databases, making information available at many places Expert systems, allowing generalists to perform specialist tasks Telecommunication networks, allowing organizations to be centralized and decentralized at the same time Decision-support tools, allowing decision-making to be a part of everybody's job Wireless data communication and portable computers, allowing field personnel to work office independent Interactive videodisk, to get in immediate contact with potential buyers Automatic identification and tracking, allowing things to tell where they are, instead of requiring to be found High performance computing, allowing on-the-fly planning and revisioning

Data warehouse
In computing, a data warehouse (DW) is a database used for reporting and analysis. The data stored in the warehouse is uploaded from the operational systems. The data may pass through an operational data store for additional operations before it is used in the DW for reporting. A data warehouse maintains its functions in three layers: staging, integration, and access. Staging is used to store raw data for use by developers. The integration layer is used to integrate data and to have a level of abstraction from users. The access layer is for getting data out for users. Data warehouses can be subdivided into data marts. Data marts store subsets of data from a warehouse. This definition of the data warehouse focuses on data storage. The main source of the data is cleaned, transformed, catalogued and made available for use by managers and other business professionals for data mining, online analytical processing, market research and decision support (Marakas & O'Brien 2009). However, the means to retrieve and analyze data, to extract, transform and load data, and to manage the data dictionary are also considered essential components of a data warehousing system. Many references to data warehousing use this broader context. Thus, an expanded definition for data warehousing includes business intelligence tools, tools to extract, transform and load data into the repository, and tools to manage and retrieve metadata.

Benefits of a data warehouse


A data warehouse maintains a copy of information from the source transaction systems. This architectural complexity provides the opportunity to:

Maintain data history, even if the source transaction systems do not. Integrate data from multiple source systems, enabling a central view across the enterprise. This benefit is always valuable, but particularly so when the organization has grown by merger. Improve data quality, by providing consistent codes and descriptions, flagging or even fixing bad data. Present the organization's information consistently. Provide a single common data model for all data of interest regardless of the data's source. Restructure the data so that it makes sense to the business users. Restructure the data so that it delivers excellent query performance, even for complex analytic queries, without impacting the operational systems. Add value to operational business applications, notably customer relationship management (CRM) systems.

Normalized versus dimensional approach for storage of data


There are two leading approaches to storing data in a data warehouse the dimensional approach and the normalized approach. The dimensional approach, whose supporters are referred to as Kimballites, believe in Ralph Kimballs approach in which it is stated that the data warehouse should be modeled using a Dimensional Model/star schema. The normalized approach, also called the 3NF model, whose

supporters are referred to as Inmonites, believe in Bill Inmon's approach in which it is stated that the data warehouse should be modeled using an E-R model/normalized model. In a dimensional approach, transaction data are partitioned into either "facts", which are generally numeric transaction data, or "dimensions", which are the reference information that gives context to the facts. For example, a sales transaction can be broken up into facts such as the number of products ordered and the price paid for the products, and into dimensions such as order date, customer name, product number, order ship-to and bill-to locations, and salesperson responsible for receiving the order. A key advantage of a dimensional approach is that the data warehouse is easier for the user to understand and to use. Also, the retrieval of data from the data warehouse tends to operate very quickly. Dimensional structures are easy to understand for business users, because the structure is divided into measurements/facts and context/dimensions. Facts are related to the organizations business processes and operational system whereas the dimensions surrounding them contain context about the measurement (Kimball, Ralph 2008). The main disadvantages of the dimensional approach are: 1. In order to maintain the integrity of facts and dimensions, loading the data warehouse with data from different operational systems is complicated, and 2. It is difficult to modify the data warehouse structure if the organization adopting the dimensional approach changes the way in which it does business. In the normalized approach, the data in the data warehouse are stored following, to a degree, database normalization rules. Tables are grouped together by subject areas that reflect general data categories (e.g., data on customers, products, finance, etc.). The normalized structure divides data into entities, which creates several tables in a relational database. When applied in large enterprises the result is dozens of tables that are linked together by a web of joins. Furthermore, each of the created entities is converted into separate physical tables when the database is implemented (Kimball, Ralph 2008). The main advantage of this approach is that it is straightforward to add information into the database. A disadvantage of this approach is that, because of the number of tables involved, it can be difficult for users both to: 1. join data from different sources into meaningful information and then 2. access the information without a precise understanding of the sources of data and of the data structure of the data warehouse. It should be noted that both normalized and dimensional models can be represented in entityrelationship diagrams as both contain jointed relational tables. The difference between the two models is the degree of normalization. These approaches are not mutually exclusive, and there are other approaches. Dimensional approaches can involve normalizing data to a degree (Kimball, Ralph 2008). In Information-Driven Business (Wiley 2010),[5] Robert Hillard proposes an approach to comparing the two approaches based on the information needs of the business problem. The technique shows that normalized models hold far more information than their dimensional equivalents (even when the same fields are used in both models) but this extra information comes at the cost of usability. The technique measures information quantity in terms of Information Entropy and usability in terms of the Small Worlds data transformation measure.[6]

Top-down versus bottom-up design methodologies


Bottom-up design Ralph Kimball, a well-known author on data warehousing,[7] is a proponent of an approach to data warehouse design which he describes as bottom-up.[8] In the bottom-up approach data marts are first created to provide reporting and analytical capabilities for specific business processes. Though it is important to note that in Kimball methodology, the bottom-up process is the result of an initial business oriented Top-down analysis of the relevant business processes to be modelled. (Data Marts: A data mart is the access layer of the data warehouse environment that is used to get data out to the
users. The data mart is a subset of the data warehouse which is usually oriented to a specific business line or team .)

(The Staging Area is used for temporarily holding the data sourced from various source systems. It can be held in any form e.g. flat files, tables in RDBMS)

Data marts contain, primarily, dimensions and facts. Facts can contain either atomic data and, if necessary, summarized data. The single data mart often models a specific business area such as "Sales" or "Production." These data marts can eventually be integrated to create a comprehensive data warehouse. The integration of data marts is managed through the implementation of what Kimball calls "a data warehouse bus architecture". The data warehouse bus architecture is primarily an implementation of "the bus", a collection of conformed dimensions and conformed facts, which are dimensions that are shared (in a specific way) between facts in two or more data marts. The integration of the data marts in the data warehouse is centered on the conformed dimensions (residing in "the bus") that define the possible integration "points" between data marts. The actual integration of two or more data marts is then done by a process known as "Drill across". A drillacross works by grouping (summarizing) the data along the keys of the (shared) conformed dimensions of each fact participating in the "drill across" followed by a join on the keys of these grouped (summarized) facts.

Maintaining tight management over the data warehouse bus architecture is fundamental to maintaining the integrity of the data warehouse. The most important management task is making sure dimensions among data marts are consistent. In Kimball's words, this means that the dimensions "conform". Some consider it an advantage of the Kimball method: That the data warehouse ends up being "segmented" into a number of logically self contained (up to and including The Bus) and consistent data marts, rather than a big and often complex centralized model. Business value can be returned as quickly as the first data marts can be created, and the method gives itself well to an exploratory and iterative approach to building data warehouses. If integration via the bus is achieved, the data warehouse, through its two data marts, will not only be able to deliver the specific information that the individual data marts are designed to do. Top-down design Bill Inmon, one of the first authors on the subject of data warehousing, has defined a data warehouse as a centralized repository for the entire enterprise. Inmon is one of the leading proponents of the top-down approach to data warehouse design, in which the data warehouse is designed using a normalized enterprise data model. "Atomic" data, that is, data at the lowest level of detail, are stored in the data warehouse. Dimensional data marts containing data needed for specific business processes or specific departments are created from the data warehouse. In the Inmon vision the data warehouse is at the center of the "Corporate Information Factory" (CIF), which provides a logical framework for delivering business intelligence (BI) and business management capabilities. Inmon states that the data warehouse is: Subject-oriented The data in the data warehouse is organized so that all the data elements relating to the same realworld event or object are linked together. Non-volatile

Data in the data warehouse are never over-written or deleted once committed, the data are static, read-only, and retained for future reporting. Integrated The data warehouse contains data from most or all of an organization's operational systems and these data are made consistent. Time-variant For An operational system, the stored data contains the current value. The top-down design methodology generates highly consistent dimensional views of data across data marts since all data marts are loaded from the centralized repository. Top-down design has also proven to be robust against business changes. Generating new dimensional data marts against the data stored in the data warehouse is a relatively simple task. The main disadvantage to the top-down methodology is that it represents a very large project with a very broad scope. The up-front cost for implementing a data warehouse using the top-down methodology is significant, and the duration of time from the start of project to the point that end users experience initial benefits can be substantial. In addition, the top-down methodology can be inflexible and unresponsive to changing departmental needs during the implementation phases.[9] Hybrid design Data warehouse (DW) solutions often resemble hub and spoke architecture. Legacy systems feeding the DW/BI solution often include customer relationship management (CRM) and enterprise resource planning solutions (ERP), generating large amounts of data. To consolidate these various data models, and facilitate the extract transform load (ETL) process, DW solutions often make use of an operational data store (ODS). The information from the ODS is then parsed into the actual DW. To reduce data redundancy, larger systems will often store the data in a normalized way. Data marts for specific reports can then be built on top of the DW solution. It is important to note that the DW database in a hybrid solution is kept on third normal form to eliminate data redundancy. A normal relational database however, is not efficient for business intelligence reports where dimensional modelling is prevalent. Small data marts can shop for data from the consolidated warehouse and use the filtered, specific data for the fact tables and dimensions required. The DW effectively provides a single source of information from which the data marts can read, creating a highly flexible solution from a BI point of view. The hybrid architecture allows a DW to be replaced with a master data management solution where operational, not static information could reside. The Data Vault Modeling components follow hub and spoke architecture. This modeling style is a hybrid design, consisting of the best of breed practices from both 3rd normal form and star schema. The Data Vault model is not a true 3rd normal form, and breaks some of the rules that 3NF dictates be followed. It is however, a top-down architecture with a bottom up design. The Data Vault model is geared to be strictly a data warehouse. It is not geared to be end-user accessible, which when built, still requires the use of a data mart or star schema based release area for business purposes. Dimensional model

Pros: 1. Data Retrieval performance 2. Good for analysis- slice and dice, roll up drill down 3. Easy for maintenance and interpretation by the administrators Cons: 1. Data loading time is increased 2. Reduces or diminishes flexibility in case of business change or dimension change 3. Storage increases due to denormalization. Same information might multiply considerably during storage Normalized model Pros: 1. Easy for plug and play in case of changes in business or model 2. Thoroughly componentized to the lowest level. So what you see is what it is 3.Reusability of common entities- For Eg/- if one entity caters to different levels in multiple dimensions.like area in geo could be used in both business and geographical hierarchy. Data need not be duplicated 4. Data loading made easy- as there is a tendency of it being similar to source data in many cases Cons: 1. Data retrieval is not so performant. 2. Maintenance goes up with increase in no of entities and relationships 3. No so easy interpretation. You always have to open up a bag of entities when you discuss any subject area. Not slide friendly

Data Mining: What is Data Mining?


Overview
Generally, data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information - information that can be used to increase revenue, cuts costs, or both. Data mining software is one of a number of analytical tools for analyzing data. It allows users to analyze data from many different dimensions or angles, categorize it, and summarize the relationships identified. Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational databases.

Data Mining Architecture


The technological objective in KDD process is to design architecture for Data Mining. In addition to the architecture, it is also intended to address the process-related issues. It is assumed that the implementation of the Data Mining Technology would be a processing, memory and data intensive task as against one that requires continuous interaction with the database.

It is also assumed that the Data Preparation (Data Extraction, Transformation, Cleansing and Loading) is outside the scope of the Data Mining architecture. To preserve the accuracy of the data mining results, the Data Preparation process must be addressed before the Data Mining process as explained in the earlier topic. The following diagram depicts generic 3-tier architecture for Data Mining.

The first tier is the database tier where data and metadata is prepared and stored. The second tier is called Data Mining Application where the algorithms process the data and store the results in the database. The third tier is the Front-End layer, which facilitates the parameter settings for Data Mining Application and visualization of the results in interpretable form.

Database
It is not necessary that the Database tier is hosted on an RDBMS. It can be mixture of RDBMS and Files System or a file system only. E.g. the data from source systems may be staged on a files system and then loaded onto an RDBMS. The Database tier consists of various layers. The data in these layers interface with multiple systems based on the activities in which it participates. Following diagram represents various layers in the Database tier. Metadata Layer

The Metadata layer is the common and most frequently used layer. It contains information about sources, transformations and cleansing rules and the Data Mining Results. It forms the backbone for the data in entire Data Mining Architecture.

Data Layer
This layer comprises of Staging Area, Prepared / Processed Data and Data Mining Results.

The Staging Area is used for temporarily holding the data sourced from various source systems. It can be held in any form e.g. flat files, tables in RDBMS. This data is transformed, cleansed, consolidated and loaded into a structured schema during Data Preparation process. This prepared data is used as Input Data for Data Mining. The base data may undergo summarization or derivation based on the business case before its presented to the Data Mining Application. The Data Mining output can be captured in the Data Mining Results layer so that it can be made available to the users for visualization and analysis. Data, Information, and Knowledge Data Data are any facts, numbers, or text that can be processed by a computer. Today, organizations are accumulating vast and growing amounts of data in different formats and different databases. This includes:

operational or transactional data such as, sales, cost, inventory, payroll, and accounting nonoperational data, such as industry sales, forecast data, and macro economic data meta data - data about the data itself, such as logical database design or data dictionary definitions

Information The patterns, associations, or relationships among all this data can provide information. For example, analysis of retail point of sale transaction data can yield information on which products are selling and when. Knowledge Information can be converted into knowledge about historical patterns and future trends. For example, summary information on retail supermarket sales can be analyzed in light of promotional efforts to provide knowledge of consumer buying behavior. Thus, a manufacturer or retailer could determine which items are most susceptible to promotional efforts. Data Warehouses Dramatic advances in data capture, processing power, data transmission, and storage capabilities are enabling organizations to integrate their various databases into data warehouses. Data warehousing is defined as a process of centralized data management and retrieval. Data warehousing, like data mining, is a relatively new term although the concept itself has been around for years. Data warehousing represents an ideal vision of maintaining a central repository of all organizational data. Centralization of data is needed to maximize user access and analysis. Dramatic technological advances are making this vision a reality for many companies. And, equally dramatic advances in data analysis software are allowing users to access this data freely. The data analysis software is what supports data mining.

How does data mining work? While large-scale information technology has been evolving separate transaction and analytical systems, data mining provides the link between the two. Data mining software analyzes relationships and patterns in stored transaction data based on open-ended user queries. Several types of analytical software are available: statistical, machine learning, and neural networks. Generally, any of four types of relationships are sought:

Classes: Stored data is used to locate data in predetermined groups. For example, a restaurant chain could mine customer purchase data to determine when customers visit and what they typically order. This information could be used to increase traffic by having daily specials. Clusters: Data items are grouped according to logical relationships or consumer preferences. For example, data can be mined to identify market segments or consumer affinities. Associations: Data can be mined to identify associations. The beer-diaper example is an example of associative mining. Sequential patterns: Data is mined to anticipate behavior patterns and trends. For example, an outdoor equipment retailer could predict the likelihood of a backpack being purchased based on a consumer's purchase of sleeping bags and hiking shoes.

Data mining consists of five major elements:


Extract, transform, and load transaction data onto the data warehouse system. Store and manage the data in a multidimensional database system. Provide data access to business analysts and information technology professionals. Analyze the data by application software. Present the data in a useful format, such as a graph or table.

Different levels of analysis are available:

Artificial neural networks: Non-linear predictive models that learn through training and resemble biological neural networks in structure. Genetic algorithms: Optimization techniques that use processes such as genetic combination, mutation, and natural selection in a design based on the concepts of natural evolution. Decision trees: Tree-shaped structures that represent sets of decisions. These decisions generate rules for the classification of a dataset. Specific decision tree methods include Classification and Regression Trees (CART) and Chi Square Automatic Interaction Detection (CHAID) . CART and CHAID are decision tree techniques used for classification of a dataset. They provide a set of rules that you can apply to a new (unclassified) dataset to predict which records will have a given outcome. CART segments a dataset by creating 2-way splits while CHAID segments using chi

square tests to create multi-way splits. CART typically requires less data preparation than CHAID.

Nearest neighbor method: A technique that classifies each record in a dataset based on a combination of the classes of the k record(s) most similar to it in a historical dataset (where k 1). Sometimes called the k-nearest neighbor technique. Rule induction: The extraction of useful if-then rules from data based on statistical significance. Data visualization: The visual interpretation of complex relationships in multidimensional data. Graphics tools are used to illustrate data relationships.

What technological infrastructure is required? Today, data mining applications are available on all size systems for mainframe, client/server, and PC platforms. System prices range from several thousand dollars for the smallest applications up to $1 million a terabyte for the largest. Enterprise-wide applications generally range in size from 10 gigabytes to over 11 terabytes. NCR has the capacity to deliver applications exceeding 100 terabytes. There are two critical technological drivers:

Size of the database: the more data being processed and maintained, the more powerful the system required. Query complexity: the more complex the queries and the greater the number of queries being processed, the more powerful the system required.

Relational database storage and management technology is adequate for many data mining applications less than 50 gigabytes. However, this infrastructure needs to be significantly enhanced to support larger applications. Some vendors have added extensive indexing capabilities to improve query performance. Others use new hardware architectures such as Massively Parallel Processors (MPP) to achieve order-ofmagnitude improvements in query time. For example, MPP systems from NCR link hundreds of highspeed Pentium processors to achieve performance levels exceeding those of the largest supercomputers.

Process
The knowledge discovery in databases (KDD) process is commonly defined with the stages (1) Selection (2) Preprocessing (3) Transformation (4) Data Mining (5) Interpretation/Evaluation. It exists however in many variations of this theme such as the CRoss Industry Standard Process for Data Mining (CRISP-DM) which defines six phases: (1) Business Understanding, (2) Data Understanding, (3) Data Preparation, (4) Modeling, (5) Evaluation, and (6) Deployment or a simplified process such as (1) Preprocessing, (2) Data mining, and (3) Results validation. Pre-processing

Before data mining algorithms can be used, a target data set must be assembled. As data mining can only uncover patterns actually present in the data, the target dataset must be large enough to contain these patterns while remaining concise enough to be mined in an acceptable timeframe. A common source for data is a data mart or data warehouse. Pre-processing is essential to analyze the multivariate datasets before data mining.

The target set is then cleaned. Data cleaning removes the observations with noise and missing data. Data mining

Data mining involves six common classes of tasks:


Anomaly detection (Outlier/change/deviation detection) The identification of unusual data records,

that might be interesting or data errors and require further investigation.


Association rule learning (Dependency modeling) Searches for relationships between variables. For

example a supermarket might gather data on customer purchasing habits. Using association rule learning, the supermarket can determine which products are frequently bought together and use this information for marketing purposes. This is sometimes referred to as market basket analysis. Clustering is the task of discovering groups and structures in the data that are in some way or another "similar", without using known structures in the data. Classification is the task of generalizing known structure to apply to new data. For example, an email program might attempt to classify an email as legitimate or spam. Regression Attempts to find a function which models the data with the least error. Summarization providing a more compact representation of the data set, including visualization and report generation. Results validation

This section is missing information about non-classification tasks in data mining, it only covers machine learning. This concern has been noted on the talk page where whether or not to include such information may be discussed. (September 2011)

Reliability

Data mining can be misused, and can unintentionally produce results which appear significant but do not actually predict future behavior and cannot be reproduced on a new sample of data. See Data snooping, Data dredging.

Product lifecycle management


In industry, product lifecycle management (PLM) is the process of managing the entire lifecycle of a product from its conception, through design and manufacture, to service and disposal.PLM integrates people, data, processes and business systems and provides a product information backbone for companies and their extended enterprise. Product lifecycle management (PLM) should be distinguished from 'Product life cycle management (marketing)' (PLCM). PLM describes the engineering aspect of a product, from managing descriptions and properties of a product through its development and useful life; whereas, PLCM refers to the commercial management of life of a product in the business market with respect to costs and sales measures. Product lifecycle management is one of the four cornerstones of a corporation's information technology structure.[3] All companies need to manage communications and information with their customers (CRMcustomer relationship management), their suppliers (SCM-supply chain management), their resources

within the enterprise (ERP-enterprise resource planning) and their planning (SDLC-systems development life cycle). In addition, manufacturing engineering companies must also develop, describe, manage and communicate information about their products. One form of PLM is called people-centric PLM. While traditional PLM tools have been deployed only on release or during the release phase, people-centric PLM targets the design phase. As of 2009, ICT development (EU-funded PROMISE project 20042008) has allowed PLM to extend beyond traditional PLM and integrate sensor data and real time 'lifecycle event data' into PLM, as well as allowing this information to be made available to different players in the total lifecycle of an individual product (closing the information loop). This has resulted in the extension of PLM into closed-loop lifecycle management (CL2M).

Benefits Documented benefits of product lifecycle management include:[4][5]


Reduced time to market Improved product quality Reduced prototyping costs More accurate and timely request for quote generation Ability to quickly identify potential sales opportunities and revenue contributions Savings through the re-use of original data A framework for product optimization Reduced waste Savings through the complete integration of engineering workflows.. Documentation that can assist in proving compliance for RoHS or Title 21 CFR Part 11 Ability to provide contract manufacturers with access to a centralized product record

Areas of PLM Within PLM there are five primary areas; 1. 2. 3. 4. 5. Systems engineering (SE) Product and portfolio management (PPM) Product design (CAx) Manufacturing process management (MPM) Product Data Management (PDM)

Note: While application software is not required for PLM processes, the business complexity and rate of change requires organizations execute as rapidly as possible. Systems engineering is focused on meeting all requirements, primary meeting customer needs, and coordinating the systems design process by involving all relevant disciplines. Product and portfolio management is focused on managing resource allocation, tracking progress vs. plan for projects in the new product development projects that are in process (or in a holding status). Portfolio management is a tool that assists management in tracking progress on new products and making trade-off decisions when allocating scarce resources. Product data management is focused on capturing and maintaining information on products and/or services through their development and useful life. The four main stages of a product's life cycle and the accompanying characteristics are: Stage 1. 2. 3. 4. 5. 6. 1. 2. 3. 4. 5. Characteristics costs are very high slow sales volumes to start little or no competition demand has to be created customers have to be prompted to try the product makes no money at this stage

1. Market introduction stage

2. Growth stage

costs reduced due to economies of scale sales volume increases significantly profitability begins to rise public awareness increases competition begins to increase with a few new players in establishing market 6. increased competition leads to price decreases 1. costs are lowered as a result of production volumes increasing and experience curve effects 2. sales volume peaks and market saturation is reached 3. increase in competitors entering the market 4. prices tend to drop due to the proliferation of competing products 5. brand differentiation and feature diversification is emphasized to maintain or increase market share 6. Industrial profits go down

3. Maturity stage

4. Saturation and decline stage

1. 2. 3. 4.

costs become counter-optimal sales volume decline prices, profitability diminish profit becomes more a challenge of production/distribution efficiency than increased sales

Introduction to development process


The core of PLM (product lifecycle management) is in the creations and central management of all product data and the technology used to access this information and knowledge. PLM as a discipline emerged from tools such as CAD, CAM and PDM, but can be viewed as the integration of these tools with methods, people and the processes through all stages of a products life. It is not just about software technology but is also a business strategy.

For simplicity the stages described are shown in a traditional sequential engineering workflow. The exact order of event and tasks will vary according to the product and industry in question but the main processes are:

Conceive o Specification o Concept design Design

Detailed design Validation and analysis (simulation) Tool design Realize o Plan manufacturing o Manufacture o Build/Assemble o Test (quality check) Service o Sell and deliver o Use o Maintain and support o Dispose

o o o

The major key point events are:


Order Idea Kick-off Design freeze Launch

The reality is however more complex, people and departments cannot perform their tasks in isolation and one activity cannot simply finish and the next activity start. Design is an iterative process, often designs need to be modified due to manufacturing constraints or conflicting requirements. Where exactly a customer order fits into the time line depends on the industry type, whether the products are for example build to order, engineer to order, or assemble to order.

Supply Chain Management


A supply chain is a network of facilities and distribution options that performs the functions of procurement of materials, transformation of these materials into intermediate and finished products, and the distribution of these finished products to customers. Supply chains exist in both service and manufacturing organizations, although the complexity of the chain may vary greatly from industry to industry and firm to firm. Supply Chain Decisions Supply Chain Management and Materials Management are competitive business edge today. A supply chain is a system of organizations, people, technologies, activities, information and resources involved in moving a product or service from supplier to customer. In many organizations, materials form the largest single expenditure item, accounting for nearly 50 to 65 % of the total expenditure. With competition growing by the day , cost reduction in business operations and yet making available various products to customers , as per their requirement, come into sharp focus.

Maintaining a flawless supply chain across all its operations thus becomes absolutely necessary for any business. Importance of supply chain management need not be over emphasized as it has become the cutting edge of business, after product quality and manufacturing capabilities of any business firm. Supply chain activities transform natural resources, raw materials and components into a finished product that is delivered to the end user. In sophisticated supply chain systems, used products may re-enter the supply chain at any point where residual value is recyclable. We classify the decisions for supply chain management into two broad categories -- strategic and operational. As the term implies, strategic decisions are made typically over a longer time horizon. These are closely linked to the corporate strategy (they sometimes {\it are} the corporate strategy), and guide supply chain policies from a design perspective. On the other hand, operational decisions are short term, and focus on activities over a day-to-day basis. The effort in these type of decisions is to effectively and efficiently manage the product flow in the "strategically" planned supply chain. There are four major decision areas in supply chain management: 1) location, 2) production, 3) inventory, and 4) transportation (distribution), and there are both strategic and operational elements in each of these decision areas. Location Decisions : The geographic placement of production facilities, stocking points, and sourcing points is the natural first step in creating a supply chain. The location of facilities involves a commitment of resources to a long-term plan. Once the size, number, and location of these are determined, so are the possible paths by which the product flows through to the final customer. These decisions are of great significance to a firm since they represent the basic strategy for accessing customer markets, and will have a considerable impact on revenue, cost, and level of service. These decisions should be determined by an optimization routine that considers production costs, taxes, duties and duty drawback, tariffs, local content, distribution costs, production limitations, etc. (See Arntzen, Brown, Harrison and Trafton [1995] for a thorough discussion of these aspects.) Although location decisions are primarily strategic, they also have implications on an operational level. Production Decisions The strategic decisions include what products to produce, and which plants to produce them in, allocation of suppliers to plants, plants to DC's, and DC's to customer markets. As before, these decisions have a big impact on the revenues, costs and customer service levels of the firm. These decisions assume the existence of the facilities, but determine the exact path(s) through which a product flows to and from these facilities. Another critical issue is the capacity of the manufacturing facilities--and this largely depends the degree of vertical integration within the firm. Operational decisions focus on detailed production scheduling. These decisions include the

construction of the master production schedules, scheduling production on machines, and equipment maintenance. Other considerations include workload balancing, and quality control measures at a production facility. Inventory Decisions These refer to means by which inventories are managed. Inventories exist at every stage of the supply chain as either raw materials, semi-finished or finished goods. They can also be in-process between locations. Their primary purpose to buffer against any uncertainty that might exist in the supply chain. Since holding of inventories can cost anywhere between 20 to 40 percent of their value, their efficient management is critical in supply chain operations. It is strategic in the sense that top management sets goals. However, most researchers have approached the management of inventory from an operational perspective. These include deployment strategies (push versus pull), control policies --- the determination of the optimal levels of order quantities and reorder points, and setting safety stock levels, at each stocking location. These levels are critical, since they are primary determinants of customer service levels. Transportation Decisions The mode choice aspect of these decisions are the more strategic ones. These are closely linked to the inventory decisions, since the best choice of mode is often found by trading-off the cost of using the particular mode of transport with the indirect cost of inventory associated with that mode. While air shipments may be fast, reliable, and warrant lesser safety stocks, they are expensive. Meanwhile shipping by sea or rail may be much cheaper, but they necessitate holding relatively large amounts of inventory to buffer against the inherent uncertainty associated with them. Therefore customer service levels, and geographic location play vital roles in such decisions. Since transportation is more than 30 percent of the logistics costs, operating efficiently makes good economic sense. Shipment sizes (consolidated bulk shipments versus Lot-for-Lot), routing and scheduling of equipment are key in effective management of the firm's transport strategy.

Online Analytical Processing


OLAP, is an approach to swiftly answer multi-dimensional analytical (MDA) queries.OLAP is part of the broader category of business intelligence, which also encompasses relational reporting and data mining. Typical applications of OLAP include business reporting for sales, marketing, management reporting, business process management (BPM), budgeting and forecasting, financial reporting and similar areas, with new applications coming up, such as agriculture. The term OLAP was created as a slight modification of the traditional database term OLTP (Online Transaction Processing). OLAP tools enable users to interactively analyze multidimensional data from multiple perspectives. OLAP consists of three basic analytical operations:

Consolidation, drill-down, and slicing and dicing. Consolidation involves the aggregation of data that can be accumulated and computed in one or more dimensions. For example, all sales offices are rolled up to the sales department or sales division to anticipate sales trends. In contrast, the drill-down is a technique that allows users to navigate through the details. For instance, users can access to the sales by individual products that make up a regions sales. Slicing and dicing is a feature whereby users can take out (slicing) a specific set of data of the cube and view (dicing) the slices from different viewpoints. Databases configured for OLAP use a multidimensional data model, allowing for complex analytical and ad-hoc queries with a rapid execution time. They borrow aspects of navigational databases, hierarchical databases and relational databases. The core of any OLAP system is an OLAP cube (also called a 'multidimensional cube' or a hypercube). It consists of numeric facts called measures which are categorized by dimensions. The cube metadata is typically created from a star schema or snowflake schema of tables in a relational database. Measures are derived from the records in the fact table and dimensions are derived from the dimension tables. Each measure can be thought of as having a set of labels, or meta-data associated with it. A dimension is what describes these labels; it provides information about the measure. A simple example would be a cube that contains a store's sales as a measure, and Date/Time as a dimension. Each Sale has a Date/Time label that describes more about that sale. Any number of dimensions can be added to the structure such as Store, Cashier, or Customer by adding a foreign key column to the fact table. This allows an analyst to view the measures along any combination of the dimensions. For example:
Sales Fact Table +-------------+----------+ | sale_amount | time_id | +-------------+----------+ Time Dimension | 2008.10| 1234 |---+ +---------+-------------------+ +-------------+----------+ | | time_id | timestamp | | +---------+-------------------+ +---->| 1234 | 20080902 12:35:43 | +---------+-------------------+

Another example:-

Fact_Sales is the fact table and there are three dimension tables Dim_Date, Dim_Store and Dim_Product. Each dimension table has a primary key on its Id column, relating to one of the columns (viewed as rows in the example schema) of the Fact_Sales table's three-column (compound) primary key (Date_Id, Store_Id, Product_Id). The non-primary key Units_Sold column of the fact table in this example represents a measure or metric that can be used in calculations and analysis. The non-primary key columns of the dimension tables represent additional attributes of the dimensions (such as the Year of the Dim_Date dimension). An OLAP cube is a set of data, organized in a way that facilitates non-predetermined queries for aggregated information, or in other words, online analytical processing. OLAP is one of the computerbased techniques for analyzing business data that are collectively called business intelligence.

Multidimensional databases Multidimensional structure is defined as a variation of the relational model that uses multidimensional structures to organize data and express the relationships between data. The structure is broken into cubes and the cubes are able to store and access data within the confines of each cube. Each cell within a multidimensional structure contains aggregated data

related to elements along each of its dimensions.Even when data is manipulated it remains easy to access and continues to constitute a compact database format. The data still remains interrelated. Multidimensional structure is quite popular for analytical databases that use online analytical processing (OLAP) applications (OBrien & Marakas, 2009). Analytical databases use these databases because of their ability to deliver answers to complex business queries swiftly. Data can be viewed from different angles, which gives a broader perspective of a problem unlike other models.

Aggregations It has been claimed that for complex queries OLAP cubes can produce an answer in around 0.1% of the time required for the same query on OLTP relational data.The most important mechanism in OLAP which allows it to achieve such performance is the use of aggregations. Aggregations are built from the fact table by changing the granularity on specific dimensions and aggregating up data along these dimensions. The number of possible aggregations is determined by every possible combination of dimension granularities. The combination of all possible aggregations and the base data contains the answers to every query which can be answered from the data .

Types
OLAP systems have been traditionally categorized using the following taxonomy In the OLAP world, there are mainly two different types: Multidimensional OLAP (MOLAP) and Relational OLAP (ROLAP). Hybrid OLAP (HOLAP) refers to technologies that combine MOLAP and ROLAP. MOLAP This is the more traditional way of OLAP analysis. In MOLAP, data is stored in a multidimensional cube. The storage is not in the relational database, but in proprietary formats. Advantages:

Excellent performance: MOLAP cubes are built for fast data retrieval, and is optimal for slicing and dicing operations. Can perform complex calculations: All calculations have been pre-generated when the cube is created. Hence, complex calculations are not only doable, but they return quickly.

Disadvantages:

Limited in the amount of data it can handle: Because all calculations are performed when the cube is built, it is not possible to include a large amount of data in the cube itself. This is not to say that the data in the cube cannot be derived from a large amount of data. Indeed, this is possible. But in this case, only summary-level information will be included in the cube itself. Requires additional investment: Cube technology are often proprietary and do not already exist in the organization. Therefore, to adopt MOLAP technology, chances are additional investments in human and capital resources are needed.

ROLAP

This methodology relies on manipulating the data stored in the relational database to give the appearance of traditional OLAP's slicing and dicing functionality. In essence, each action of slicing and dicing is equivalent to adding a "WHERE" clause in the SQL statement. Advantages:

Can handle large amounts of data: The data size limitation of ROLAP technology is the limitation on data size of the underlying relational database. In other words, ROLAP itself places no limitation on data amount. Can leverage functionalities inherent in the relational database: Often, relational database already comes with a host of functionalities. ROLAP technologies, since they sit on top of the relational database, can therefore leverage these functionalities.

Disadvantages:

Performance can be slow: Because each ROLAP report is essentially a SQL query (or multiple SQL queries) in the relational database, the query time can be long if the underlying data size is large. Limited by SQL functionalities: Because ROLAP technology mainly relies on generating SQL statements to query the relational database, and SQL statements do not fit all needs (for example, it is difficult to perform complex calculations using SQL), ROLAP technologies are therefore traditionally limited by what SQL can do. ROLAP vendors have mitigated this risk by building into the tool out-of-the-box complex functions as well as the ability to allow users to define their own functions.

HOLAP HOLAP technologies attempt to combine the advantages of MOLAP and ROLAP. For summary-type information, HOLAP leverages cube technology for faster performance. When detail information is needed, HOLAP can "drill through" from the cube into the underlying relational data.

You might also like