From Data Mining To Knowledge Discovery

You might also like

Download as pdf
Download as pdf
You are on page 1of 15
“The tomaining chapters focus om particle ines and methons. We be Tiowe this election gives bond ocr ow of the sit ofthe at in the research an application of kuowledge discovery and data mining in the context of large databases, Putting together « book om an exciting ave rapidly evolving. aren of new technology is not ax enay matter. ‘The KDD workshops have served to bring together researchers intrested in this area form many fields and many countries. "That only established our base community for sampling the chapters to inchude inthis book. ‘This collection is the result of the outstanding, cooperation of almost a hundeed people ‘on several continents, Ftst and foremost we thank the authors of the ‘chapters ofthis book who perforie! multiple revisions under tight time ‘constraints, and who put up with our domands and suggested changes: We ate ako geateful to the the membors of the progam committee of the 1994 AAAT Workshop on KD, Wie thank Mike Havaiton, who ably coordinated the typesetting and tiniey proluction of this volume. For encouragg thiseffort threnghont the lst two years we are grateful to our management at our respective institutions: NASA and the Jet Propulsion Laboratory—Califrnia tn stitute of Technology, GTE Laboratories, and General Motors Research ‘nu! Development Center. Finally, this work woul not have been pos ‘le without the patience nnd understanding of ou fil. Usama Paya, Gregory Platetsty Shapiro, Paalvsic Smyth, & Ramasamy Uthurasarny. Qa ) From Data Mining to Knowledge Discovery: ‘An Overview ‘Usama M. Fayyod “Jet Propulsion Laboratory California Institute of Technology Gregory Pitetsky Shapiro GTE Laboratories Padraic Smyth Jet Propulsion Laboratory California Institute of Technology ‘We are drowning in information, tout starving for knowledge = John Beisbett Abstract “The explsive gronth of many busines, government, and scene databnaes thw for ontpaced cur lity ko interprat and digest this data, erating 9 need fon now generation of tools and techniques for automated ond intsigent “database analysis, These tole and techniques are the aubiect of ce rapidly merging field ofknowlelge coy in database (KOD) and are thesubjctof {hs Donk, This chapter presenta oneview the tate of hearin thi ik Wie rt ela our view of tie velo betwona knowledge discovery and data if the ROD proce and basic date mining pplication iasues in x00 including guidelines mining, Wo boi with « ethos. We proces! for electing an appieatior ait surent challenges foing practitioners inthe fold, The diaeuson relates netods and problems to applicable chapters in the hook, with the xa! af proving a unifying vision of Ue common overall onl share by the chapters 1.1 What Is tl Book About? In the last decade, we have soon an explosive growth in our capabilities to both generate and collect date. Advances In scientific data callec- ticw (eg. from remote eensors oF from space satellites), the widespread Payyad, Patetaky-Shaplto,& Smyth Introduction of bar codes for almost all commercial products, and the ‘computerization of many busines (eg. credit eard purchases) and gov- ‘ernment transection (eg. tax returns) have generated a flood of dnta. ‘Advances in data storage technology, such as faster, higher capacity, ‘and cheaper storage devices (eg. magnetic dks, CD-ROMS), better ‘database management systems, and date warchousig technology, have allowed us to transform this data deluge into “mountains” ofstored data, Representative examples are eaay to Rind. Inthe business world, one cof the largest databases in the world was crested by Wal-Mart (a US. re- tailet) which havdles over 20 milion transactions a day (Babcock 1994), ‘Most health-care transactions in the U.S. are belng stored in eomput- rs, yielding mult-ggabyte databases, which many large companies are heginning to analyze In order to control costs and improve quality see Mathews, Platetsky-Shapiro, & McNeil, this volume). Mobil Oil ‘Corporation, is developing « date warehouse capable of storing over 100 terabytes of data related to oll exploration (Harrison 1998) “Thera are huge scientific databases aa well. The huraan genome data- base project (Fasman, Cuticehia, de Kingsbury 1994) has collected gga bytes of data on the human genetic code end rich more is expected A database housing a sky object catalog, fom a major astronomy sky sutvey (eg. see Fayyad, Djorgovskl, & Wer, this volume) consate of billions of entries with raw image data sizes measured in terabytes. The NASA Earth Observing System (EOS) of orbiting satellites and other soborne instruments Is projected to generate on the order of 50 gigar bytes of remotely sensed image data per hour when operational in the late 19908 and early in the next century (Way & Smith 1991) Such volnes of data clearly overwhelm the traditional manual meth- cls of data analysis uch at spreadsheets and ad-hoc quetls. Those rmetlods can erete informative reports from data, but cannot analyze the contents of those reports to focus on important knowlege. A signif- leant need exists for a new generation of techniques and tools with the ability to teligendly aud eutomaticaly assist humans in analyzing the mountains of data for nuggets of useful knowledge. ‘These techniques and tools aze the subject ofthe emerging Bld of knowledge discovery in databases (KDD) "The interest in KDD has been increasing, as evidenced by the number of recent workshops (Piatetsky-Shapiro 1001, Piatotsky.Shapiro 1993, ‘iacko 1994, Fayyad & Udurosamy 1904), which eulminated In the ‘rom Date Mining to Knowledge Dicovery 3 iret International Conference on Knowledge Discovery and Data Min- ing (Payyad & Uthurusainy 1995). A growing number of publications Ihave been devoted to the topic including (Inmon & Osteeflt 1991, hatetsky-Shapiro 1992, Parsaye & Chigoell 1095, Cetcone &e Tauehlya 1994, Piatetsky-Shapiro etal 1904, Piatetsky-Shapio 1995). These var- tous publications and special issues document some of the many KDD applications which have been reported across diverse fds in business, rovernment, mice, and sclanee (sue Section 1.6). This book brings together the most vecent relevant research inthe fed, continuing in the tradition of the frst Knowledge Discovery in Databases book (Piateteky= Shapico and Frawley 1901). ‘The-chapter begins by discussing the historical context of KD and data mining and the choice of title for this book. We begin by expla the distinction between the term data mining and knowledge discovery, ‘ad explain how they fit together. The basic view we adopt is one where ‘data. iining refers to a clase of methods that are used in some ofthe stops ‘comprising the overall KDD process, We then provide a deiition of KDD in Section 1:2. The typleal stops involved in the KDD process are outlined ‘ud seuaied in Section 1.3. We then focus in particular on data mining ‘methods in the context ofthe overall KDD process. Section 1.4 covers th general iasucsinvlved in data mining while Section 15 discusses specific ‘lata mining wothods, Having defined the basic terms and introduced ome ofthe methods, we turn our attention to the practical appleatc fssues of KOO In Section 1.6, Section 1.7 concludes the chapter with ‘preview of the test of the chapters in this volume. Throughout, we Telate the discussion of particular methods and techniques to applicable chapters with the goal of providing a unifying vision of tho com ‘overall gols shared by the chapters constituting this book. 11.1 About this Book's Title Historically the notion of finding useful patterns (oF nuggets of know edge) in raw data has been given various names, including knowledge discovery in databases, data mining, knowledge extraction, information discovery, information harvesting, data archaeology, and data patten processing. Ths term knowledge discovery in databases, of KDD for short, ‘was coin i 1080 to eler tothe broad process of finding knowledge in ata, atid to emphasie tho “high-level application of particular data ‘The term data mining has Leen commonly used by stotistcins, data analysts and the MIS (Management Information Sys- tems) community, whlle KOO has been mostly wed by artifical intel rence and machine learning researchers. Th this overview chapter we adopt the view that KD refers to the over- all pmoets of discovering useful knowledge fom data while data mining tofers tothe application of algorithms for extracting patterns from data without the aditional steps of the KD} procs (we as Incorporating fappropriate prior knowledge and propor interpretation of the results). ‘These additional steps are essential to ensure that useful Information (enowiedge) is derived from the data. Blind application of data mining rmetlocds (rightly eriticloed as “ishing” of “dredging,” and sometimes ‘8 “inning,” tn the statistical Iterature) cam be a dangerous activity in that invalid patterns enn be discovered without proper interpretation. “Tha, the overall process of Binding and interprcting patterns from data ie refered to a8 the KOD process, typically interactive and itera- {uve, involving the repeated application of specifi data mining methods igorithms and the interpretation of the patterns generated by these gorithms. In sections to fllow we will provide a more detailed defini- tion ofthe overall KDD process and » more detailed look at specific data mining methods. Tn combining the two terms “data mining” and “knowledge discovery" In the title ofthe book, we are attempting to build bridges between the statistical, database, and machine learning communities and appeal to ‘wider audience of information systems developers. ‘The dual nature ofthe title reflects the contents of the book al the ditection of the Field, namely w focus on both types of isues: (i) the overall knowlege discovery process which inclides preprocessing sad postprocessing of data as well as interpretation of the discovered patterns ws knowlege, ‘and (i) partiulae data mining methods and algorithms almed soley at fextrnctng patra from raw data 1.1.2 Links Between KDD and Related Fields KD is of interest to researchers ia mack ne logening, pattern recogni tlon, databases, statistics, actif! Intelligence, knowledge acquisition for expert syateis, and data visualization. KDD spsteins typically draw upon methods lgortins, and techniques from these diverse elds. The ‘From Data Mining to Knowledge Discovery 5 Inthe fields of machine lenrning and pattern recognition, overlap with oD lies in the study of theories and algorithms for systems which ex- tract patterns and models from data (mainly data mining methods). KDD focuser om the extension of these tories and algorithms to the problei of finding special patterns (onea that may be Interpreted as useful or interesting knowledge, se the next section) in lange sets of eab-world dats. hp also has rich in common with statistics, par- ticularly exploratory data analysis (BDA). KDD systems often embed particular statistical procedures for modeling dala and handling noise ‘vithin an overall knowledge discovery framework. ‘Machine discovery which targets the discovery of empirical laws fom observation and experimentation (Shragor & Langley 1990), and enusal modeling for the Inference of causal models from data (Spirtes, Gly- mou, de Shines 1903) are elated research areas. Kloesgen & Zythow (this volume) provide a glosary of tems common to KD and machine discovery. “Aviother relate ntea Is data werehousing, which refers to the recently popular MIS trend for collecting and cleaning teansactional data and making them available for on-line retrieval. A popular approach for analysis of data warehouses has been called OLAP (on-line analy ‘al processing), alter a new set of principles props by Covld (1999). OLAP tools focus on providing mult-dimensional date analysis, which 's superior to SQL (standard query language) in computing suromaries and brenksowas along many dimensions. We view both knowledge dis- covery aux! OLAP as elated facts of a new generation of intelligent information extrnetion and management tools. 1.1.3 A Simple Mlustrative Example In the disewsion of KD and data ining methods I this chapter, we hall make ase of m simple exanuple to make some of the notions more concrete, Figure 1.1 shows a simple two-dimensional artificial data set consisting of 23 eases. Each point on the graph represents a person who hhas been given a loan by a patticular bank at some time in the past. ‘The horizontal axis representa the income of the person, the wertical ‘axin represents the total personal debt of the person (mortgage, ear payments ete). ‘The data hus been classed into 2 classes: the »' represent peteons who have defaulted on ther loans, the o's represent ‘persons whose loans ate in good status with the bank. ‘Thus, this simple hia vtificial data act could represent historical data st which may contain ‘wseful knowledge from the polut of view of the bank making the lon Note that in actual KDD applications there are typically many mor timensions (up to several hundreds) and many more data points (many ‘thousands of even millions). The purpose here sto iustrate basi ideas ‘na etal problem in 2-dimensional space. ‘lla ot with 2 laser usd for steatiepurpse. 1.2 A Definition of Knowledge Discovery in Databases ‘To reflec the recent development and growth in KDD, we have revised the definition of KDD given in (Pravley, Pitetsky Shapiro, & Mathews 1901). We fist start with a general statment ofthis definition in words: Knowledge discovery in databases is the non-trivial process of identifying valid, novel, potentially useol, and ultimately understandable patterns in data, Let us examine these tarms in more detail + Date isa s0t of facts F (eg, cass in a database). In our simple ceaunple of Figure 1.1, F is the collection of 23 cases with three fields each containing the values of deb, income, and loan status. From Data Mining to Knowledge Discovery 7 ‘Pattern i an expression E in a language L describing facts in a subst Fr of F. B is enlled a pattern iti slmpler (in some sense, ‘10 below) shan the enumeration of all facts in Fp. For example the pattern: “If income < $t, then person has defaulted ow the loa” would be one such pattern for an appropriate cholee oft. "This pattern ia illusteated graphically in Figute L.2 ‘Ung tage thes on the icone vaca to ty to clay the loan data et «+ Process: Usually in KD process io a multistep process, whieh favo data preparation, search for patterns, knowledge evalun- tion, and refinement involving iteration after modification. The process ix nasuned to be non-tival—that i, to have some degree fof search autonomy. For example, computing the mean income of ‘erin in the lot exaanple, while producing a useful result, does not qualify at discovery. Validity: Phe discovered patterns aould be valid on new data with sone degree of certainty. A measure of certainty isa function © mapping expressions in to pari of totally ordered measure- tment space Mo. Au expression E in Z about a subset Fe CF ‘ca be assigned a certainty measure = C(E,P). For example, If the boundary for the pattern shown in Figure 1.2 is moved to the right, is certaluty measure would drop since more good loans ‘would be siitted into the shaded region (no loan), + Novel: ‘The patterns are novel (at loast to the system), Novelty can be mended with respect to changes In data (by comparing - 8 Fayynd, Pintotaky-Shapiro, & Smyth ‘enrrent values to previous or expected values) or knowledge (how ‘a now finding is tlated to old ones). In general we assume this ‘can be measured by a function M(B, F), which ean be a boolean function oF a measure of degree of novelty or unexpectednes, ‘* Potentially Useful: ‘The patterns should potentially lead to some ‘efi ations, as measured by some utility function, Such a fanc- tion U maps expressions In L: to a partilly or totally ordered rmensire space My: hence, = U(E,F). For example in the loan ‘dats art this fanetion could be the expected inereas in profits to the bank (in dollars) associated with the decision rule shown in Figure 12. Ultimately Understandable: A gost of KD is to make patterns understandable to humane in order to faiitate « better under- ‘standing of the underlying data. While thsi dificult to measure precy, one frequent substitute is the simplicity measure, Sev- feral mensires of simplicity exist, and they range from the purely syntactic (e4, the size of m pattern in bits) to the semantic (©, fenay for hurnans to comprehend in some setting). We assume this 4s measured, if possible, by a fanetion mapping expressions in L to a partially or totally ordored measure space Ms: hence, $= 5(E,F), ‘An iemportant notion, ealledinteestingnes, is usally taken as an overall menste of pattern value, combining validity, novelty, usefulness, ‘and simplicity: Some KDD systems hnve an explicit iterestingness fame ton f= I(E, F,C,N,U,8) whieh mapa expressions in L to @ measure pace My. Other systems define intorstingness indirectly via an order- Ing ofthe discovered patterns ‘Given the notions listed nbove, we may state our definition of knove- ‘ge as viewed from the nareow perspective of KDD as used inthis book ‘This shy no meane an attempt t define “knowlege” in the philosp! leal or even the popular view. ‘The purpose of this definition is specity ‘what an algorithm used in a KDD process may consider knowledge + Krowlelge: A pettern B ¢ £ Ip called knowledge if for some user specified threshold i © Mj, IB, F,C,N,U,S) > Note that this defrtion of knowledge fs by no means absolute. As a matter of fact, it ls purely user-oriented, and determined by whatover ‘Prom Data Mining to Knowledge Discovery 9 functions and thresholds the user chooses. For example, one instantia- tion ofthis definition isto select some thresholds ¢€ Me, 4 € Ms, and We Ma, nd calling a pattern E knowledge if and ony if C(B,F) > ¢ end S(B,P)> 5 and U(S,F) > u By appropriate settings of thresholds, one can emphasize accurate pre- dictors or useful (by some cost measire) patterns over others. Clearly, there is an infinite space of how the mapping I can be defined. Such eisions aco left to the ser nd the specifics ofthe oma ‘Data Mining i a step in the KD process consisting of par ticular data ning algorithms that, under some acceptable computational effcency Hnitations, produces a particular tnuineration of patterns By over F (ve Sections 14 and 15 for wre details) [Note that the space of patterns i often infinite, and the enumeration of patterns involves some form of search inthis spnee. ‘The computational ‘ficiency constraints place severe lite on the subspace that can be explore by the algorithm. DD Proves a the process of wsing date mining methods (algorithms) to extrnct (identify) what is demed knowledge noordng to the apeiintions of moasires and thresholds, tming the databse F along with any rquited preprocessing, subeampling, and transformations of P. ‘The data asining component of the KD proces s mainly concerned with micans by which patterns ate extracted andl entmerated from the ata. Knowledge discovery involves the evaluation and possibly inter- pretation of tho palters Lo make the decision of what constitutes knowl: falge and what doce not, Italo includes the choiee of encding schemes, reprocessing, sampling, and projections of the datn prior to the data ‘ining step, 1.8 The KDD Process "The KDD process i in nany decisions active and iterative, involving numerous steps ing made by the vse. Brachman 8 Anand (this Py Fayyad, Piateteky-Shapito, & Smyth str fected ining the EDD proce, volume) give a practical view of the KDD process emphasizing the inter- active nature of the proces. Here we broadly outline some ofits basic step 1, Developing an understanding of the application domain, the rle- vant prior kiowledge, atid the goals of the end-user. 2, Creating a target data sot: selecting a dataset, or fousing on & subset of variables or data samples, on which discovery is to be performed, 8. Data cloning an preprocisings bnsie operations such as the re- moval of noise or outers if appropriate, collecting the necesury information to mode ar aecount for noise, deciding on strategies for handling missing data fields, scouting for tie sequence in formation and kaowa changes 4, Dota reduction and projection: finding wef features to represent the data depending on the goal of the task. Using dlmensional- ity reduetion of transformation mathods to reduce the effective number of variables under consideration oF to find invariant rep- feoentationa for the data 5. Choosing the data mining task deiding whether the goa! of the KDD proces is casificaton, regression, clustering, ete. The vari- ‘us possible tasks ofa data ining algorithon are deseribed in more etal in Soetion 14.1 46. Choosing the data mining algorithm(s): selecting method(s) to be wed for searching for patterns in the data, This includes deciding which models and parameters may be appropriate (eg, models From Data Mining to Knowledge Discovery 11 for categorical data are diferent than models on vectors over the reals) and matching a particular data mining method with the ‘overall criteria of the KD process (eg the end-user may be wore intristed in understanding the model than its predictive apabiltes—soe Seetion 1.42). 7, Dota mining: searching for patterns of interest in « particular representational form oF wstof such representations classifieation rules otros, regression, clustering, and so forth (60e Section 1.5 for details). The usr can sguiicmntly aid the data mining method by correctly performing the preceding steps. 8, Interpreting mined patterns, possible return to any of steps 1-7 for further iteration, 9. Consolidating discovered knowledge: incorporating this knowledge into the performance system, or sinply documenting it and report- ing io interested partes. This also includes checking for and re- solving potential eonfits with previously believed (or extracted) knowledge ‘The KDD process can involv significant iteration and may contain loops between any two steps. The base low of steps (although not the potential multitude of Werations and loops) are ustrated in Figure 1.3 Most previous watk om KDD his fociaed on step T—the date talning However, the other steps are of considerable importance forthe success. {ul application of KDD in practice. See the chapter by Brachman de Anand, this volume, for « more elaborate acount of this napect. avin defined the basie notions and introduced the KDD proces, we ‘now focus on the data mining component, whic has by far received th most attention in the literature ‘The data mining component ofthe KD proces often involves repeated iterative application of particular data mining methods. The objective of ‘this section isto preset «unified overview of some ofthe moet popular dats mining wethods in eurcent use, We use the terms patieras and ‘models loosely Unroughout this chapter: « pattern can be thought of 4 instantiation of w wodel, eg, f(x) = Se? + bs a pattern whereas Sle) = a2" + fir is considered» model 12 ayy, Piattshy Sapir, fe Sinyth Data mining involve fitting mods to, or determining patterns from, ‘olmorved data, ‘The fitted models play the tole of inferred knowledge: ‘whether or not. the models relleet wiefal or interesting knowledge Is part of the over, interactive KDD process where subjective human. judges is usually required. ‘There are to primary mathematical for- ‘nals used i model ting: the statistical approach allows for non- ‘deterministic effects in the model (lor example, f(2) = ax-+e where e tok! he a Gussina radon sible), whorens logical model ie purely Aoterministic (f(a) = a2) and does not admit the possibilty of uncer, tainty in the modeling proces. We will focus primarily on the static tiesl/pooiiste approach to data mitings this tends to be the most viely-ned Isis for practical data mining ppliations given the typ ‘eal uncertainty nbout the exact matte of eal-wockl datn-generating procesars, Soe the chapter ly Elder & Peegibon (this volume) for a perspective from the fick of statistics "Meat data mining methods are bagel on eancepts from machine learn Ing, pattern recognition ant statisti: elasifention, eustering, araph- feal modes, snd 0 forth. The actay of diferent algorthns for volving tench of these problems ean etn be quite bewiklring to both the ex Drienort data analyst and the novies, In this section we oer brief ‘overview of dats mining methonds andl in paetcular try to convey the notion that snoat (I ot all) methods can be Iiybida of few basic techniqus and prineiples “The section begins by discussing the primary Laks of dat mining and ‘thn shows tat the daa mining methods to alee Uhese tasks consist ‘ofthc primary algorithmic components: madel representation, model febuation, and seer. The section conelides by discussing particular lat ining alqorthans within this eamework ul as extensions oF 1A. Tho Primary Tasks of Data Mining “The toe “highteveP” priary gonbs of eat wining pemctice tnd 19 tie prediction and description. Prodietion involves wing some variables ‘or fields in the database to predict unknown or future values of other ‘varinhles of interest. Deseription focuses oa finding human interpretable patterns deseribing the datn. ‘Th relative importance of prediction and desription for parteular data mining applications ean vary consider- bly. Howover, in the context of KD, detription tends to be more nvortaut thou prediction, This isin contrast to pattern recognition From Data Mining to Knowledge Discovery 13 and machine learning applications (sich as apeech recognition) where prediction soften the primary gos! (eee Lehmann 1980, for a discussion from a statistical perspective), "The goals of predition and description are achieved by using the following primary data mining tasks. ‘Silica Boundary forthe oan dotnet shaded region desten far ta + Classification is learning a fonetion that maps (clasfes) a date Itein into one of several predefined classes (Hand 1981; Weis & Kutikowski 1991; McLachlan 1992). Examples of clnsification rmothods used as part of knowledge discovery applications include hssfyng teens in financial markets (Apte & Hong, this vol ‘uc) aid atoninted Sntiiention of objets of interest in Ig Iinaye databges (Payynd, Djorgowsh, Ae Weir this volume). Fig- ‘we 1 slows a simple putitioning ofthe loan data into two class regions note that it is wot porsble to soparate the clases per Feely using a liesr decision boundary. The bank might wish to tse the clasifention regions to automatically decide whether fu- ‘vine Ioan applicants will be given a loan or not Regression is learning w function which maps ajdata item to a real-valued prediction variable. Regression applications are many, ‘eg, predicting the amount of blomass present in a forest given ‘Fayyad, Piatetsky-Shapiro, & Smyth income iesowave measurements, estimating the probar bility tata patiout wil die given the reaults of a st of diagnostic tests, predicting consumer demand for & new product asa func tion of edvertising expenditure, and time series prediction where the input variables ean be time lagged versions of the prediction, variable. Figure 1.5 shows the result of simple linear regression where "total deb” Is fitted as a linear function of *income”: the fit is poor since thor is only a weak correlation betwoen the two Ses. Ghustering sa common descriptive task where one seoks to identify ‘finite set of eatagories or clusters to describe the data (Tttering- ton, Smith & Makov 1985: Jain de Dubes 1988). The categorie ‘may be mutually exclusive and exhaustive, or consist of a ticher presentation such as hierarchical or overlapping categories, Bx- es of clustering applications in a kaowedg discovery etext for consumers i ‘marketing databases and identification of sub-categories of spoctea from infra-red sky measurements (Cheesuman & Stute, this vol ue). Figure 1.6 shows « possible clustering of the loan date aot to 3 clusters: note that the clusters overlap allowing data points to belong to more than one cluster. ‘The origina class labels (de. tnd o's in the previous figures) have been replaced bership is uo longer assumed remotely-saed noted by 3° by 4's to indeute that the class From Date Mining to Kuowindge Discovery 15 known. Closely related to clustering isthe task of probability den- ity estimation which consists of techniques for estimating from data the jot mult-variate probablity density function ofall of the varables/felds in the database (ive + Summarization involves methods for nding & compact descrip ‘ion fora subset of data, A simple example would be tabulating the mean and standard deviations forall fells. More sophisti< «ated methods involve the derivation of summary rules (Ageawal ‘tal, this volume), mutivarate visualization technljes, and the discovery of funetional relationships between variables (Zembow fee & Zythow, this volume). Sunumaization techniques aro often ‘applied to interactive exploratory data analysis and automated report generaio Dependency Modeling consists of finding a model which describes significant dependencies betweos variables, Dependency models xls at two level: tho structural level of the model specie (a tan in graphical form) which variables are locally dependent on ach other, whereas the quantitative level ofthe moulel specifies the strongths of the dependencies using coe numerical scale. For ‘example, probabilistic dependency networks use conditional inde. Detulenee to specify the structural aspect ofthe inodel and probe 16 ayy, Platetsky-Shapieo, e Smyth Lilies or cortlations to specify the strengths of the dependencies (Utockerman, this volume; Glysnour eta, 1987). Probabilistic de- Poendency networks ae iiereasingly finding applientions in arens a8 ‘verse asthe development of probabilistic modes expert systems from databnaes information retrieval, and modeling ofthe human ‘+ Change and Devietion Detection focses on discovering the most significant chaoges in the data from previously measured or nor. ‘mative values (Bernd & Cliford, this volume; Guyon otal, this ‘volume; Kloesge, this volume; Matheus et al, this vol : saville& Nikiforov 1993) 1.4.2 The Components of Data Mining Algorithins Having outlined the primary tasks of data mining, the noxt step isto ‘construct algorithins to cove them. One enn identify three primary ‘componente in eny data mining algorithm: model representation, model fevaluation, and search, This reduetionst view le not necesaly com- plete or flly encompassing: rather, i i» convenient way to express the key concepts of data mining algorithms im m rcltively unified and compact manner (Chorseman (1990) outlines a similar structure). 1+ Modet Representations the language L for describing discoverable Ptterns. If the representation is too limited, then no amount of ‘raining time or examples wil produce an seeurate model forthe dlata. For example, a decision tre representation, using anivari- ‘te (sngle-Feld)nodle-splts, partitions the input space into hyper planes which ae parle to the ateibute axcs. Such a docsion-troo Imothod eannot discover from data the formula z = y no matter how onuch training data iti yiven. ‘Hh, it i important that a data nnalyet folly comprehend the representational asiumptions ‘whiets may be inborent to particular method. Te is equally im- ‘portant that an algorithm desiguer clearly state which representa tional assumptions are he gorithm, Note ‘that more powerful represcatational power for mols increases the danger of overfitting the tralning data resulting in oduced predic- ton accuracy on unseen data. In addition the search becomes ‘much more eomplex and interpretation of the model is typically more diffial 33 a4 om Data Mining to Knowlaige Discovery , 17 «Model Bvaluation estimates how well a particular pattern (8 model ani its parmineters) moct the etteria of the KDD process. Evalue ‘tion of prictive accuracy (valiity) is bas on cross validation "Evaluation of deseriptive quality involves predictive accuracy, nov- city, utility, and understandably of the fitted model, Both logical fund statistical esteria ean be used for model evaluation. For ex fmple, the maximuin likelihood principle chooses the parameters for the model which yell the best fit to the training data “Search Method consists of two components: Parameter Search and ‘Maael Search. parameter search the algorithm must search for the parameters whic optiniae the model evaluation eritria given bsctved data and fixed model representation. For relatively imple problems there is no seare: tho optimal parameter esti- nates can be obtained in closed form. Typically, for more general tides, a close form solution is not available: grecdy iterative Inethods are eoesinonly wel, ef, the gradient descent nethod ‘of baekprepagation for worl networks. Model Senrch occurs a8 f loop over the paraincter search method: the model representa: tion is changed 0 that a family of models are egnsidered. For tech specie moved repreacutation, dhe parameter stare, method fr instantiated to cvahate the quality of that particular model, Tuplementatios of mde] seare methods tend to se heuristic sane techniques since te siz of the space of possible models of ten prohibits exbvstive search saul clase form solutions are not ‘wily obtainable 1.5 A Discussion of Popular Data Mining Methods “tlure exist wide variety of eaten metho: here we only focus con a subset of popitar techniques. Each method is discussed in the teontext of moxel representation, model evaluation, and earch 1.5.1 Decision Trees and Rules Decision trees and rues that use univariate splits have a simple represen tational form, making the infrted model relatively easy to comprehend by the user. However, the restrition to a particular tree or rule rep- resentation ean sghifenntlyresteiet the functional form (and thus the re 9 i“ aya, Patesky-Shapio, Sy approximation power) of the model. For example, Figure 1.2 lustrates the effect ofa threshold "split" epplied to the income variable fr loan ddata st: it is clear that using such simple thteshold splits (parallel tothe Feature axes) severely lit the type of casification boundaries which can be induced. If ono enlarges the model space to allow more genera ‘expressions (such as multivasate hyperplanes at arbitrary angles), then the model is more powerful for prediction but may be much more dif ficult to comprehend. “There are a large number of decision tree and ‘lo induction algorithms described in the machine learning and applied statistics literatura (Breiman ct al 1984; Quinlan 1992), ‘To a large extent they are basid on likelihood-based model evalua ton methods with varying degrees of sophistication in terms of penal- izing model complexity. Greedy search mothods, which involve growing ol pruning rule and tre structures, ate typlelly employed to explore the super-exponential space of possible models, Trees and rules are primarily use for predictive medeling, both for classification (Apte 8 Hoong, this volume; Fayyad, Djorgoveli, & Welr this woe) and regres son, although they can alo be applied to eummary descriptive modeling (Agrawal et al, this volume, 1.5.2 Nonlinear Regression and Classification Methods ‘Thuse methods consist of a family of tchuiques for prediction which ft linear and non-linear combinations of bass functions (sginoids, spines, polynomials) to combinations ofthe input variables. Examples include feedforward neural networks, adaptive spline methods, projection pur- suit regression, and so forth (se Friedman (1980), Cheng & Titerington (1994), and Elder & Pregibon (this volume) for more detailed diseus- sions). Consider neural networks, for example. Figuee 1.7 lustates the type of non-linear decision boundary which a netiral network might find for the loan data set. In term of model evaluation, while networks ‘of the appropriate size can univerauly approximate aay smooth fune- tion to any desiced degree of accuracy, relatively litle Is known about the representation properties of fied size networks estimated fom fie ‘data gots. In terms of model evaluation, the standard squared error and ross entropy loss functions usod to train neural networks ean be viewed 4 log-likelihood functions for regression and classification respectively (Geman, Bienenstock & Doursat 1992; Ripley 1994). Backpropagation {sa parameter search method which performs gradient descent in para 2 J rom Data Mining to Knowledge Discovery 19 ‘ter (weight) space to ind s local maximum ofthe likelihood function ‘atartng from random initial conditions. Nonlinear regression methods, though powerful in representational power, can be very dificult to inter prot, For example, while the classification boundaries of Figure 1.7 may bbe more nceurate than the simple threshold boundary of Figute 1.2, the threshold Loundary la the advantage that the modal can be expressed fs a simple rule of the form "If income is greater than threshold ¢ then Toan will ave good etatue” to some degree of certainty Figure 7 at ‘Am evap of lication boundaie 0 ‘st sd forth on at so inoue sito ouch a 1.8.3 Example-based Methods ‘The representation is simple: use repretentatve examples from the database to approximate a mod, 5, predictions on new examples are derived frum the properties of “similar” examples In the model whose prediction is known, ‘Techniques include nearest-neghbor classifieation fand regression algorithis (Dasarathy 1991) and case-based reasoning systems (Kolodner, 1993). Figute 1.8 iustrates che use of a nearest neighbor elasifier for the loan data set: the clas at any new point it ‘the 2-dimensional space isthe seme asthe class of the closest pont i the original traning datas. ‘A potential disadvantage of example-based methods (compared with tree-based methods for example) is that a well-defined distance metic (20 Fayynd, Platetsky-Shapiro, & for evaluating the distance botween dats points is requeed. For the loan ‘data in Figure 1.8 this would not tw problem since Income and debt fre measured in the same units: bit Hone wished to include a vari ‘able sue the duration ofthe loan, then it would requite more effort tw define a sonsile motte between the vnrinbles. Model evaluation i ‘ually bined on crasvalidation esti (Wiis Be Kulhowsl, 1001) ‘of x prediction ertor: “parameters” of the inode to be estimated ent neh the tnber of neighboes to vee for prediction al the distance netic Ise, Like non-linear regression inethods, example-based meth ‘ons ar alten asymptotically quite powerful fy treme of appremination propertios but conversely ean be dl to interpret since the model Is ‘imple the eata and not explieily foriulated. Relate tckiques in sty etinntion (Silwrman 1986) and mlxtute modeling ton, Smith, Mako 1085). 1.5.4 Probabilistic Graphieal Dependency Models Graphical meets specify the probabilistic dependencies which under. lin © particular mode! using a graph structure (Peast 1988; Whiter, 1200). In its simplest form, the model species which variables are di rectly dependent om ac other. ‘Typically these models are used with cetagorient or dlicrete-valued variables, but extensions to special ease, sch as Gnussian densities, for real-valued variables are alvo possible, Flom Data Mining to Knowlalge Discovery 22 Within the artificial intelligence and statistical communities these mod- «ls wore initially developed within the framework of probabilistic export systems: the structure of the model and the parameters (the conditionel probabilities attached to the links of the graph) were elilted fron ex- Pert More reeently there nt boow sigalfennt. work in both the A ‘and statistical commounities on metheds whereby both the sructara wi parameters of graphical models ean be learned from databases directly (Buntin, thie volume; Heckermen, this volume). Model evaluation c teria are typeally Bayesian in form and parameter estimation ean be ‘a mixture of elosed form estimates and iterative methods depencing on ‘whether a variable is directly observed or hidden. Model search can com sist of greody hill-clnbing methedl over various graph structures. rior Jaowledge, suc as n partial ordering ofthe variables based on causal Jations, ca be quite useful la terns of reducing the mode! search rpace Although sil primarily atthe research phase, graphical model induc- tion methods are of patieuar interest to KDD since the graphleal fom fof the model lend itself ensily to human interpretation, 1.5.5 Relational Loarning Models ‘While devision-tras aad res have x epresentation restreted bo propo: citional loge, relational learning (so known a inductive lgie program: ning) uses the more flexible pattorn language of Gest-order loge. A relational learner ean easily fd formulas such as X=¥. Meat resanee ‘0 far ou model evaluation methods for relational lentning ae lgieal in nature, ‘The extra representational power of relational models ome at ‘the price of significant computations denna in trims of sonrch, Seo Daztosk (this volume) for a more detailed discussion, Given the brow spectrum of data mining methods avd algorth ‘our le overview ie inevitably init in scope: there are many dat 1nining techniques, puetenlarly specialized methods for particular types fof data and domains, which wore not mentioned specifically in the di cussion. Wo below the general diseussion on data mining tasks and onents hss general relevance to a variety of methods. Ror example, ‘onside time sores preietion: traditionally this has boon east ae pr Alictive regression task (autoregressive models and so forth). Recently, ‘more general models have becn developed for time series applications sich as non-linene bass function, example-based, and kernel method thermore, there bas been siguifiant Interest In deseriptive ery ae : -. Paya, Piatetshy-Shapiro, & Smyth = cal and local data modeling of tine series rather than purely predictive modeling (Welgend & Gershenfld 1998). Ths, although diferent algo- rithis and applications may appear quit dilfeent on the eurface, it is ‘ot uncommon to find that they share niany coumon components, Une ‘derstanding data mining and moulel induction at this component level clarifies the task of any data mining algorithm and makes It easier for the user to understand its overall coutsbution and applicability to the EDD process We would lke to remind the reader ¢hnt our discussion and overview ‘of data mining methods hus been both eursory and brief ‘There are two, Important poiats we would like to make clear 1. Automated Seareli: Our brief overview has focused mainly on auto- ited methods for extracting patterns and/or models from data, ‘While this is consistent withthe definition we gave ear, thot necessarily represent what other communities anight refer to 1s data mining, For example, some use the term to designate any ‘manual search of the data, or search assisted by queries to a DBMS, ‘or humans visualizing patterns in data as data mining, In other ‘communities, it is used to refer to the automated correlation of data from transactions of the automated generation of transaction reports. We choose to focus only on methods that contain certain degeces of serch autonomy, Beware the Hype: ‘The state-of-the-art fn automated methods in ‘data mining is stil in a arly ealy stage of development. There ere no established criteria for deciding which methods to use in which circumstances, and many of the approaches ate based on erude heuristic approximations to avoid the expensive search required to find optimal or even good solutions. hence, the reader should be careful when confronted vith overstated claims about the great bility ofa system to mine useful information from large (or even ‘anall) databases, 1.6 Application Issues In the busines world, the most successful and widespread application of KDD is “Database Markoting,” which e« method of analyzing customer databases looking for patterns among existing customer preferences and Prom Data Mining to Knowledge Discovery 23 using those patterns for more targeted selection of future customer ‘Business Week, a popular business magazine in the United States, ear red a cover story,on Database Marketing (Berty 1994) that estimated that over 60% of al otallersare using or planning to use database mar- keting. ‘The reason Is simple—significant results can be obtained using this approach: e.g. a 16-20% percent increase in credit-card purchases reported by American Express (Berry 1994). “Another major busines use of datn mining mothods isthe analysis and selection of stocks and other financlal Instruments. ‘There are lzeady merous investinent companies (Barr and Mani 1994) which pick stocks ‘using a variety of advanced data mining methods, Several successful applications have been developed for analyse and reporting on changein dats, ‘Thest include Caverstory from IRI (Sclunta, ‘Armstrong, & Little 1990), Spotigh fom A.C, Nielsen (Anand &e Kahn | 1902) for supermarict sales data, and KEFIR from GTE, fr health care databases (Matheus, Platetsky-Shapiro, & MeN, thie volume) ‘Fraud detection nnd prevention is another area where KDD plays role. Wiile there have been many applications, published information ia, for obvious reasous, not roudlly available. Here we mention just a few noteworthy examples, A system for dotecting healtheare provider fraud in electronically submitted clains, has been developed at Travelers Insurance by Major and Riedinger (1992). The Internal Revenue Service has developed a pilot system for selecting tax returns for audits, Neural network base! tools auch ao Nestor FDS (Blanchard 1994) have been ovelope for detecting credit-card fraud and are reportedly watching rnilions of accounts, ‘A numberof interesting and important scientific applications of KDD ve also been developed. Example appleation areas in science Include + Astronomy: ‘The SKICAT system from JPL Caltech Is used by ‘etronomers to automatialy deny tae and galas lage scale aky survey for cataloging and scene analysis (0 Fayyad, Djorgvehi,& Weir, this volun), « olecwar Biology: Systema have boon developed for nding pat ternain molcular structures (Conlin, Forte, and Glasgow 1993) and in gentic data (Holder, Cook, and Djoko 1904). + Global Climate Change Modeling: Spatio-temporal patterns such af a 21 Payyad, PiatetskyShopino, & Sth ‘8 eyelones are antomatically found from large sinilated and ob- servational datasets (Stolors otal, 1084). Otter fut pete ao derbi (Fyne Uhuru 1994-1995, Pintetsky-Shapiro 1993). ae 2 ietines for § ecting a Potential KDD Application The eterin for srlocting applications ean he divided inte practical and luvhiienl. The practien briteria for KD projets see similar to those fir other application of alvanced technology, while the technical ots mone specie to KD. Practical ersteria le consideration of the potential for signif ccant impact of an application. For busines applleations this could tbo measured by extern such as geeater revene, lower costs, higher ‘avality, or savings in tins. For seentitic applications the hmpact ean be toasted by the novelty and quality of the discovered knowledge and by increased aces to datn vin autem Another important practical consideration is that no. good a tives exist: the solution is not ensly obtainable by other standard beans. Hence the iltinate ser hit a strong vested interest in etting ths meces of the KDD venture. Organizational support is another consideration: there shou be a ehapion for wing new technology ‘ef domain expert who can define a proper intrestingnss mensare foe that snnain ws wel 6 pticpnte ht the KDD process, Pally an Important practical ennievation is the potential for privacy /legal isons. ‘This spies primutily to databinss on to guard against the disecweed patter easing invasion of pivacy. ople where one needs al ur etieal ise of Teeknieal extern incl conser sh oan asc nab ot Inte romp he pats ge mada ede erro or ota (stn tle in sr of del nig Anctier cnet eleven f Attributes eer he data tebe lv athe de ove not of i ln ton and ate fy Ft From Data Mining to Knowle Discovery 25 Furthermore, low noise levels (few data errors) is another consi ation. High amounts of noise make It hard to Identify patterns unless fa large numberof cnes can mitigate random noise and help earify the tagrogate patierns. A related consideration is whether one can attach confidence intervals to extracted knowledge. In some applications, it fs crural to attach confidence intervals to predictions produced by the EDD system. This allows the user to eaibrate actions appropriately. Finally, and perhaps one of the most important considerations a prior knowledge. Tt is very iwoful to know something about the dom what ate the important elds, what are the lkely relationships, what és he user utility Fonction, what patterns re already known, and 0 forth Prior knowledge can significantly reduce the search in the data maining, step and all th other stops inthe KDD proces 1.6.2 Privacy and Knowledge Discovery ‘When dealing with databases of personal information, governments and businesses have to be careful to adequatalyadeess the legal and eth ‘eal issues of invasion of privacy. Ignoring this issue can be dangerons, fa Lots found in 1990, when they were planning to introduce « CD- ROM with data on abort 100 million American houssholds. The stormy protest led to the withdenwal ofthat product (Rosenberg 1992), ‘Current discussion centers around guidelines for what constitutes m oper diacovery ‘The Organization for Beosorle Cooperation and De- Yelopment (OECD) guidelines for data privacy (O'Lemry 196), whieh have been adopted Ly most. Buropean Union countries, suggest that data about specific living intvidwals should not be analyzed without Uc consent. ‘They also suggest that the data shouldjonly be collected for a seciie purpose, Use for other purposes ia possible only with the consent ofthe date subject or by authority of the la. Th the US. there ie ongoing work om deaf prineples for fir infor- ‘mation use tlate to the National Information Infrastructure (NIH), Commonly known as the *ifrmntion supechighway.” These principles permit the use of "transactional cords," such as phone numbers called, troditeatd payments ee, as long as sich use fs compatible with the notice. ‘The ise of transactional records ean be seen to also Inclicle discovery of patterns Tn many eases (eg. medial research, eoelo-economic stu) the goal in ta dnuver patterns about groups, not Individuals, While group pete 2% Fayyed, Piatetaky-Shapiro,& Smyth arn discovery appears not to violate the restrictions on personal data, retrieval, an ingenious in tio -ombination of several group patterns, expecially small detesets, may allow identifation of specie personal informa, mn, Solutions which allow group pattern discovery while avoiding the Potential inasion of privacy include removal or replacement of dently. ing fields, performing queries on random subsets of data, and combining Individuals into groups and allowing ol ‘late! aves ‘queries on groups. These and further discussed in (Piatetaky-Shapiro 1995b). 1.6.3 Researeh and Application Challenges for KDD. We ourtine some ofthe current primary research and pplication chal Jen Bt ws for knowledge discovery. ‘This lst ls by no means exhaustive. goa is Lo give the render a fel forthe types of problems that KDD Dracttioners wrestle with. We point to chapters in this book that are ‘of relevance to the challenges we ls. ‘+ Larger databeses. Databases with hundceds of fields and tables, ‘ilions of records, and multi-glgabyta sae are quite commonplace, ‘and terabyte (10% bytes) databases are begining to appear. For example, Agraval et al (this volume) present efficient algorithms for enumerating all association rules exceeding given confidence thresholds over large databases. Other possible solutions include sampling, approximation methods, ani massively parallel process, ing (Holsheimer etal, this volume). High dimensionality. Not only is there often avery large number of records in the database, but there ean alo be avery large number of fields (attributes, vatiabls) so that the dimensionality of the problem is high. A high dimensional data set creates problems in terms of inerensing the size ofthe search space for movel Induction in acombinatorally explosive manner. In addition it increases the chances that a data mining algorithm will ind spurious patterns ‘hat are-not vali in ganeral. Approaches to this problem include ‘methods to reduce the effective dimensionality of the problem and the use of prior knowledge to identify irrelevant variables, ‘+ Overfting, When the algorithm soarches forthe best parameters for one particular model using a liited act of data, ie may over. fit the date, resulting in poor performance of the iodel on test data, Posible solutions include cross-validation, regularization, ‘rom Date Mining to Knowlalge Discovery 27 ‘and other sophisticated st ‘this volume). istical strategies (Elder & Pregibon, Assessing statistical significance. A problem (related to overit- ng) occurs when the system is searching over many possible mod- cls, For exnmple, ia systain tests N models at the 0.001 signi jemice level, then on average, with purely random date, N/1000 ‘of these models will be accapted as significant. ‘This point is fe ‘quently missed by many initial attempts at KDD. One way to deal ‘with this problen isto use methods which adjust the test statis- ic as @ function of the search, e., Bonferron adjustments for Independent tests Changing data and knowledge. Rapidly changing (non-stationary) data may make previously discovered pattern invalid, I addi- tion, the variables measured in a given appliation database may bbe modified, deleted, or augmented with new ineasurements over time. Possible solutions include ieremontal methods for updating the pattorns and treating change as an opportunity for discovery by using it to eue the search for pattern of change only (Mathes tal this volume). Missing and noisy data, This problem is especially acute in busl- wes datbites, U.S. census data reportedly has error rates of up to 20%, Important attributes may be missing Ifthe database was not designed with discovery in mind. Possible solutions include more sophisticated statistical strategies to identify hidden variables and dependencies (Hteckerman, this volume; Smyth eta, this volume), Complex relationships between fields, Hlerarchically steuctuted at tributes or values, relations between attributes, and more eophis- ticnted menus for representing knowledge about the contents of ‘databace will require algorithms that can electvely utilize such information, Historically, data mining agorithins have been devel ‘oped for simple attribute-value record, although new techniques {or deriving relations between variables are being developed (Die- tos, this volume; Han and Fu, this volume) Understandabity of patterns. Ix many applications i iy impor tnt to make the discoverer more understandable by humans PPosible solutions include graphieal representations (Buntine, this 1 Payal, Piatetuhy.tit sare rah (Can, the we) at cee (Mathis a tha voli), an eae ese dat nd how Ral ecm anon a erg 120) et be ed to aes pn el hl mayb pity cea + Veriton and wr knee. Hay ena KD er Od ol go ety tected cae ee loons aout a bln exept St doin Kw eget a the KDD proces ott Son 1 upeeat {4 Chom Btn ee ea Simonds a (this volume) make use of ductive databace ty stacove:Mnowlege that is then used to gulde the late mining + Igaton wth other spstens. sana nt A stantanedeowey teem Inthe ey. Type nepton ase ton wit a DOMES (og. vine quly hte), nana ee soreness ovation tnt and merece ten wang. Exar of ted DD een Tah by Sous ta his wie) and Shen tal el vo LT Organization of this Book ‘The chapters of thi hook sp Fa covery: cusifaton and chotering, trends deviation mont, de derivation, integrated discuvery systens, ligmentr! databece Bystnis and nplication eave stuies. The contributing authors inches tranny and prctiines i academia government abortaricy sod ‘mvnto industey, indicating the breadth of interest Inthe eld, We bang ‘rxanzed the book into seven puts an an appendix Part I deals with fundamental ities i i ‘he do ‘sce im discovery. Brachman and Anni outlive the stateofthe practice ofthe KDD process, Buatina Presents © unifying view of various data mining techniques under the ond aren of graphical models. Eller and Pregibon provide the reales vental issues of knowlege di rom Data Mining to Knowledge Discovery 29 ‘a general statistical perapectiveon knowledge discovery and data mining. Part Il deals with specific techniques for data mining. Dzetoski pre- sents an overview of recent developments relevant to KDD in inductive Togle progranning (ILP). Cheeseman and Stutz present a Bayesian ap- proach to clustering and discuss the details of the AutoClass system. ‘AvtoClass attempts to infer the most likely number of classes in the ata, and the most tikely parameterization ofthe probability distsibu- tions chosen to model the date. Guyon, Matic, and Vapnile present a novel approach for discovering informative patterns within a supervised learning eamework anil describe the appleation of thee techniques to “ata cloning” of lage optical character recognition databess, Gaines Alisensss the ww of exception directed aeycle graphx (EDAGs) for ef- ent representation of indeed knowlege, Part II presonts methods for dealing with trend and deviation anal gain Derudt and Giller how how to adapt dyn the warping (adynasnie programing tchiiqie use in speach rcsgition) to find- Ing patterns in thne series data. Klocsgen describes Bxplora, a multi- strategy discovery asistant, and examines the options for discovering diferent. types of deviations nother patter Part 1V focusrs on data sning techniques for deriving dependencies eckertinn provides n survey of curtont search in the Fld of learning traphical models (also known as Bayesian networks) from data: graphic cal models provide an efficent feamework for representing and reasoning with joint probnbilty distsibutions over multiple variables. Agena, ‘Toivonen aid Verkno introduce m variety of novel extensions of enrlier work on deriving nsociation rules trom transaction tlata: mpircnl results demonstrate Unt the new algorithms are rch moc ellcint than previous versions, Zembowiez aud Zythow show how iy lables to discover different types of knowledge, in cluding dependencies and taxonomics Part V focuses on itegtated discovery systems whic include malt rls components employ several data mining techniques, and generally ‘addzes ists in solving some real-world problems. Simoudls, Livezey, fand Kerbor discuss how rule induction, deductive databases, and data visualization enn be used cooperatively to erente high quality, rle-based niodels by mining data stored in relational databases. Shen, Mitbander, ‘Ong, end Zaniolo present a fermework that uses metaqueres to Inte rate inductive learning methods with doductive database technologies,

You might also like