Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 12

DESIGN AND IMPLEMENTATION OF DATAWAREHOUSE FOR AUTOMAKER SALES

BY NEHA J. PANDEY (C-439) SUVARNA PATTAYIL (C-443) SAPNA A. RANA (C-447) EKATA S. SAHARE (C-450)

Abstract
Manufacturing units like automobiles face several problems like data handling. These problems cause a drop in revenues in marketing and retailing departments. This loss of revenue can be attributed to poor planning and lack of transactional data on individuals in the market .The mission of Automaker Sales Data Warehouse project is to provide strategic and tactical support to all departments and divisions of a manufacturing company through the acquisition and analysis of data pertaining to their customers and markets. This project helps to identify problem areas and areas pertaining to marketing through creation of a Data Warehouse that will provide the company with a better understanding of its customers and markets.

Overall, the project has identified three basic needs: Acquiring and maintaining core data about the automobiles, individuals, and businesses within the market area. Acquiring and maintaining transactional data on the automobiles, individuals, and businesses within the market area. Acquiring and implementing the tools needed to effectively manipulate and access the core and transactional data. This project provides a wide variety of benefits to a number of manufacturing units within a automobile company. These benefits are expected to help drive marketing and retailing, as well as improve productivity and increase revenue.

Introduction

A data warehouse is a relational database that is designed for query and analysis rather than for transaction processing. It usually contains historical data derived from transaction data, but it can include data from other sources. It separates analysis workload from transaction workload and enables an organization to consolidate data from several sources. In addition to a relational database, a data warehouse environment includes an extraction, transportation, transformation, and loading (ETL) solution, an online analytical processing (OLAP) engine, client analysis tools, and other applications that manage the process of gathering data and delivering it to business users. The automotive industry designs, develops, manufactures, markets, and sells motor vehicles,.It is one of the world's most important economic sectors by revenue. The term automotive industry usually does not include industries dedicated to automobiles after delivery to the customer, such as repair shops and motor fuel filling stations.

The objectives are to design out overburden, inconsistency and to eliminate waste. The most significant effects on process value delivery are achieved by designing a process capable of delivering the required results smoothly, by designing out inconsistency. It is also crucial to ensure that the process is as flexible as necessary without overburden since this generates waste. The objective of this project is to develop a prototype sales data warehouse that allows the auto manufacturer to monitor product sales and dealer performance.

Problem Statement

This data warehouse project deals with sales subject area of an auto manufacturer. In this project, it is assumed that several car dealers pre-purchase inventory from the auto manufacturer and act as sales agents on behalf of the manufacturer. Due to the distributed nature of the sales transactions, the manufacturer does not have a centralized and integrated view of their product sales. Furthermore, it is crucial that the sales and marketing divisions of the manufacturer have the ability to frequently monitor, assess and compare the performance of their dealers because they do not have direct control over the sales of their products. A sales data warehouse offers a solution to these problems by providing the manufacturer with answers to questions such as:

Which models are profitable or losing money? Which features in their products are very popular among customers? How are their sales changing over time and geography? Who is buying their products? How are they financing their purchases? How are their dealers faring? Etc.

Literature survey
Data Warehouse Overview

A data warehouse maintains its functions in three layers: staging, integration, and access. Staging is used to store raw data for use by developers (analysis and support). The integration layer is used to integrate data and to have a level of abstraction from users. The access layer is for getting data out for users. This definition of the data warehouse focuses on data storage. The main source of the data is cleaned, transformed, catalogued and made available for use by managers and other business professionals for data mining, online analytical processing, market research and decision support. However, the means to retrieve and analyze data, to extract, transform and load data, and to manage the data dictionary are also considered essential components of a data warehousing system. Many references to data warehousing use this broader context. Thus, an expanded definition for data warehousing includes business intelligence tools, tools to extract, transform and load data into the repository, and tools to manage and retrieve metadata.

The concept of data warehousing dates back to the late 1980s when IBM researchers Barry Devlin and Paul Murphy developed the "business data warehouse". In essence, the data warehousing concept was intended to provide an architectural model for the flow of data from operational systems to decision support environments. The concept attempted to address the various problems associated with this flow, mainly the high costs associated with it. In the absence of a data warehousing architecture, an enormous amount of redundancy was required to support multiple decision support environments. In larger corporations it was typical for multiple decision support environments to operate independently. Though each environment served different users, they often required much of the same stored data. The process of gathering, cleaning and integrating data from various sources, usually from long-term existing operational systems (usually referred to as legacy systems), was typically in part replicated for each environment. Moreover, the operational systems were frequently reexamined as new decision support requirements emerged. Often new requirements necessitated gathering, cleaning and integrating new data from "data marts" that were tailored for ready access by users. Benefits Some of the benefits that a data warehouse provides are as follows: A data warehouse provides a common data model for all data of interest regardless of the data's source. Prior to loading data into the data warehouse, inconsistencies are identified and resolved. This greatly simplifies reporting and analysis. Information in the data warehouse is under the control of data warehouse users so that, even if the source system data are purged over time, the information in the warehouse can be stored safely for extended periods of time. Because they are separate from operational systems, data warehouses provide retrieval of data without slowing down operational systems. Data warehouses can work in conjunction with and, hence, enhance the value of operational business applications, notably customer relationship management (CRM) systems.

Data warehouses can record historical information for data source tables that are not set up to save an update history.

Design
The principal characteristic of a dimensional model is a set of detailed business facts surrounded by multiple dimensions that describe those facts. When realized in a database, the schema for a

dimensional model contains a central fact table and multiple dimension tables. A dimensional model may produce a star schema or a snowflake schema. Star Schemas A schema is called a star schema if all dimension tables can be joined directly to the fact table. Dimension Tables Dimension tables encapsulate the attributes associated with facts and separate these attributes into logically distinct groupings, such as time, geography, products, customers, and so forth. A dimension table may be used in multiple places if the data warehouse contains multiple fact tables or contributes data to data marts. For example, a product dimension may be used with a sales fact table and an inventory fact table in the data warehouse, and also in one or more departmental data marts. A dimension such as customer, time, or product that is used in multiple schemas is called a conforming dimension if all copies of the dimension are the same. Summarization data and reports will not correspond if different schemas use different versions of a dimension table. Using conforming dimensions is critical to successful data warehouse design.

Implementation
ETL Extract, transform and load (ETL) is a process in database usage and especially in data warehousing that involves: Extracting data from outside sources Transforming it to fit operational needs (which can include quality levels) Loading it into the end target (database or data warehouse)

However in reality, the source system data must originate from different transactional and operational systems of each dealer. The source system may also include some of the manufacturers own source systems to get product data. The data from these heterogeneous and distributed systems will have to be extracted, cleansed, integrated and transformed to a format suitable for the load into the database. A nightly refresh of the warehouse may be recommended to keep the data reasonably update. Oracles Data Warehouse Builder may be a suitable choice for an ETL tool since the target database is Oracle and the integration and transformation activities are relatively simple. Informaticas PowerCenter may prove to be a better choice if the auto manufacturer wants to collect data from a globally distributed sales network. The following tables will be created:

Time Time table stores information pertaining to the date of sale. Cust_Demographics Customer_Demographics table stores information pertaining to the customersdemographics. Product Product table stores information pertaining to the manufacturers products. Dealer Dealer stores information pertaining to the dealers Method_of_payment Method_of_Payment table stores information pertaining to the method of payment. Sales_Fact Sales_Fact table stores information pertaining to the sales facts. Processes and steps A process contains a series of steps that performs a transformation and movement of data for a specific warehouse use. In general, a process moves source data into the warehouse. Then, the data is aggregated and summarized for warehouse use. A process can produce a single flat table or a set of summary tables. A process might also perform some specific type of data transformation. A step is the definition of a single operation within the warehouse. By using SQL statements or calling programs, steps define how you move data and transform data. When you run a step, a transfer of data between the warehouse source and the warehouse target, or any transformation of that data, can take place.

A step is a logical entity in the Data Warehouse Center that defines: A link to its source data. The definition of and a link to the output table or file. The mechanism (either an SQL statement or a program) and definition for populating the output table or file. The processing options and schedule by which the output table or file is populated. Suppose that you want Data Warehouse Center to perform the following tasks: Extract data from different databases. Convert the data to a single format. Write the data to a table in a data warehouse. You would create a process that contains several steps. Each step performs a separate task, such as extracting the data from a database or converting it to the correct format. You might need to create several steps to completely transform and format the data and put it into its final table. When a step or a process runs, it can affect the target in the following ways: Replace all the data in the warehouse target with new data Append the new data to the existing data Append a separate edition of data Update existing data You can run a step on demand, or you can schedule a step to run: At a set time Only one time Repeatedly, such as every Friday In Sequence, so that when one step finishes running, the next step begins 7 running Upon completion, either successful or not successful, of another step

If you schedule a process, the first step in the process runs at the scheduled time. The following sections describe the various types of steps that you will find in the Data Warehouse Center. SQL steps The Data Warehouse Center provides two types of SQL steps. The SQL Select and Insert step uses an SQL SELECT statement to extract data from a warehouse source and generates an INSERT statement to insert the data into the warehouse target table. The SQL Select and Update step uses an SQL SELECT statement to extract data from a warehouse source and update existing data 7 in the warehouse target table. Program steps

This step run predefined programs and utilities. The warehouse programs for a particular operating system are packaged with the agent for that operating system. You install the warehouse programs 7 when you install the agent code. Transformer steps Transformer steps are stored procedures and user-defined functions that specify statistical or warehouse transformers that you can use to transform data. You can use transformers to clean, invert, and pivot data; generate primary keys and period tables; and calculate various statistics. User-defined program steps A user-defined program step is a logical entity within the Data Warehouse Center that represents a business-specific transformation that you want the Data Warehouse Center to start. Because every business has unique data transformation requirements, businesses can choose to write their own program steps or to use tools such as those provided by other companies, such as ETI or Vality. For example, you can write a user-defined program that will perform the following functions: Export data from a table. Manipulate that data. Write the data to a temporary output resource or a warehouse target. Queries Business questions The sales analysts are concerned with the following business questions: What are the total quarterly sales revenue generated by dealer for the past year? How has the MSRP base price of vehicles belonging to the Car category changed over the past year? Perform analysis by month. What is the current distribution of sales by financing type by dealer? How many vehicles were sold by product category? Which car model has been in most demand in the last year perform analysis by month? What percentage of actual sales price was paid as down payment in a sale? Perform analyses by the income-level of customer demographics. How many vehicles were sold by product by gender and by demographic age? The number of transactions has been done in last month? The total number of cars or any other vechiles manufatured till date?

The average number of customers visiting every year? The number of vechiles sold during sale,between dates? The number of vechiles repaired between dates?

You might also like