Data Mining

Data Mining
Q.1. What is Operational Intelligence? 1. Operational Intelligence relates to finance, operations, manufacturing, distribution, logistics and human resource information viewed along time periods, location/geography, product, project, supplier, carrier and employee.
Q.2. What is Business Intelligence? Explain components of BI architecture. 1. Business Intelligence is use to refer to systems and technologies that provide the business with means for decision-makers to extract personalized meaningful information about their business and industry.
a. Business Intelligence Infrastructure: 1) Based on the overall requirements of business intelligence the data integration layer is required to extract, cleanse and transform data into load files for the information warehouse.
2) This layer begins with transaction level operational data and Meta data about these operational systems. 3) The product of a good data staging layer is high quality data, a reusable infrastructure and Meta data supporting both business and technical users. 4) The information warehouse is usually developed incrementally over time and is architected to include key business variables and business metrics in a structure that meets all business analysis questions required by the business groups. 5) The information warehouse layer consists of relational and/or OLAP cube services that allow business users to gain insight into their areas of responsibility in the organization. 6) Customer Intelligence relates to customer, service, sales and marketing information viewed along time periods, location/geography and product and customer variables. 7) Business decisions that can be supported with customer intelligence range from pricing, forecasting, promotion strategy and competitive analysis to up sell strategy and customer service resource allocation. 8) Operational Intelligence relates to finance, operations, manufacturing, distribution, logistics and human resource information viewed along time periods, location/ geography, product, project, supplier, carrier and employee. 9) The most visible layer of the business intelligence infrastructure is the applications layer, which delivers the information to business users. 10) Business intelligence requirements include scheduled report generation and distribution, query and analysis capabilities to pursue special investigations and graphical analysis permitting trend identification. 11) This layer should enable business users to interact with the information to gain new insight into the underlying business variables to support business decisions. 12) Presenting business intelligence on the Web through a portal is gaining considerable momentum. Portals are usually organized by communities of users organized for suppliers, customers, employers and partners. 13) Portals can reduce the overall infrastructure costs of an organization as well as deliver great self service and information access capabilities. 14) Web based portals are becoming commonplace as a single personalized point of access for key business information.
Q.3. Differentiate between Database Management Systems (DBMS) and Data Mining? Data Mining encompasses a number of technical approaches, such as clustering data summarization, classification, finding dependency networks, analyzing changes, and detecting anomalies.
Area Task Type of Result
DBMS Extraction of detailed and summary data. Information
Data Mining Knowledge discovery of hidden patterns and insights. Insight and Prediction
Method
Deduction (Ask the question, verify with data) Who purchased mutual funds in the last 3 years? This software that manages data on physical storage devices. The software provides the ability to store, access and modify the data.
Examples Usage
Induction (Build the model, apply it to new data, get the result) Who will buy a mutual fund in the next 6 months and why? Refers to the finding of relevant and useful information from databases.
Q.4. What is Neural Network? Explain in detail. 1. An Artificial Neural Network (ANN) is an information-processing paradigm that is inspired by the way biological nervous systems, such as the brain, process information. 2. The key element of this paradigm is the novel structure of the information processing system. 3. It is composed of a large number of highly interconnected processing elements working in unison to solve specific problem. 4. ANN is configured for a specific application, such as pattern recognition or data classification, through a learning process. 5. Neural Networks are made up of many artificial neurons. 6. An artificial neuron is simply an electronically modeled biological neuron. How many neurons are used depends on the task at hand. 7. It could be as few as three or as many as several thousand. 8. There are different types of Neural Networks, each of which has different strengths particular to their applications. 9. The abilities of different networks can be related to their structure, dynamics and learning methods.
Basic structure of Artificial Neurons
Q.5. What is partition algorithm? Explain with the help of suitable examples. 1. The partition algorithm is based on the observation that the frequent sets are normally very few in number compared to the set of all item sets. 2. The partition algorithm uses two scans of the database to discover all frequent sets. 3. The partition algorithm is based on the premise that the size of the global candidate set is considerably smaller than the set of all possible item sets. 4. The intuition behind this is that the size of the global candidate set is bounded by n times the size of the largest of the set of locally frequent sets. 5. The algorithm executes in two phases. In the first phase, the partition algorithm logically divides the database into a number of non overlapping partitions. The partitions are considered one at a time and all frequent item sets for that partition are generated. 6. In phase II, the actual supports for these item sets are generated and the frequent item sets are identified.
Partition Algorithm P=partition_database(T); n=Number of partitions // Phase I For i=1 to n do begin Read_in_partition (T1 in P) Li=generate all frequent of Ti using a priori method in main memory. End // Merge Phase For (k=2;L, i=1,2,, n; k++) do begin C =n i=1
Q.6. Describe the following with respect to Web Mining. A. Categories of Web Mining : 1. Web mining can be broadly defined as the discovery and analysis of useful information from the World Wide Web. 2. Web mining can be broadly divided into three categories:
I.
Web Content Mining It targets the knowledge discovery, in which the main objects are the traditional collections of multimedia documents such as images, video and audio, which are embedded in or linked to the web pages. It is also quite different from data mining because web data are mainly semi structured and/or unstructured, while data mining deals primarily with structured data. It requires creative applications of Data mining and / or Text mining techniques and also its own unique approaches. Web content mining could be differentiated from two points of view: Agent -based approach or database approach. Web Content Mining Challenges Data/Information Extraction Web Information Integration and Schema Matching Opinion extraction from online sources Knowledge synthesis Segmenting Web pages and detecting noise.
II.
Web Structure Mining It focuses on analysis of the link structure of the web and one of its purposes is to identify more preferable documents. It helps in discovering similarities between web sites or discovering important sites for a particular topic or discipline or in discovering web communities. The goal of web structure mining is to generate structural summary about the web site and web page. It focuses on the structure of inner-document, while Web structure mining tries to discover the link structure of the hyperlinks at the interdocument level. This type of structure mining will facilitate introducing database techniques for accessing information in Web pages by providing a reference schema.
III.
Web Usage Mining It focuses on techniques that could predict the behavior of users while they are interacting with the WWW. It discover user navigation patterns from web data. It tries to discovery the useful information from the secondary data derived from the interactions of the users while surfing on the Web. It collects the data from Web log records to discover user access patterns of web pages. There are mainly four kinds of data mining techniques applied to the web mining domain to discover the user navigation pattern: o Association Rule mining o Sequential pattern o Clustering o Classification.
B. Application of Web Mining: A few applications for web mining are as follows: E-commerce Customer Behavior Analysis E-commerce Transaction Analysis E-commerce Website Design E-banking M-commerce Web Advertisement Search Engine Online Auction
C. Web Mining Software: 1. Open source software for web mining includes RapidMiner, which provides modules for text clustering, text categorization, information extraction, named entity recognition, and sentiment analysis. 2. RapidMiner is used for example in applications like automated news filtering for personalized news surveys. It is also used in automated content-based document and e-mail routing, sentiment analysis from web blogs and product reviews in internet discussion groups. 3. Information extraction from web pages also utilizes RapidMiner to create mash-ups which combine information from various web services and web pages, to perform web log mining and web usage mining. 4. SAS Data Quality Solution provides an enterprise solution for profiling, cleansing, augmenting and integrating data to create consistent, reliable information. 5. With SAS Data Quality Solution you can automatically incorporate data quality into data integration and business intelligence projects to dramatically improve returns on your organizations strategic initiatives. 6. Weka is a collection of machine learning algorithms for data mining tasks. The alsorithms can either be applied directly to a dataset or called from your own java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. 7. It is also well-suited for developing new machine learning schemes.Weka is open source software issued under the GNU General Public License.

Data Mining

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Mining

Uploaded by

Copyright:

Available Formats

Data Mining

Area Task Type of Result

DBMS Extraction of detailed and summary data. Information

Basic structure of Artificial Neurons

You might also like