Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

International Journal of Computer Information Systems, Vol. 3, No.

3, 2011

DUCK Data Umpire by Cubical Kits

SarathChand P.V. B.E (cse),M.Tech(cs),(PhD) Professor and Research Scholar Indur Institute of Eng. & Technology,Siddipet, Medak Dist.A.P. India. Email id:
Rajalakshmi Selvaraj Senior Lecturer,Botho College, Gaborone,Botswana. Email-id: Laxmaiah.M Professor and Head,CSE at Tirumala Engineering College Bogaram(v) Keesra(M) RR(Dist) Email-id:

VenuMadhav K. Senior Lecturer,Department of Applied InformationSystems,University of Johannesburg,South Africa. Email-id: Parvateeswara rao D. M.Tech(cse),(Ph.D) Assistant Professor and Research Scholar, Indur Institute of Eng. & Technology,Siddipet, Medak Dist.A.P. India. Email id

Abstract The Data cube mining is a process of accessing the relevant data from the data bases and ware houses. Many organizations provide the state-of-the-art and the technology but if the data do not possess the necessary integrity and granularity the state- of-the-art can lead the organization in the wrong direction. Oftentimes the data may be the proxy for many thing else that is likely to be linked to the purchasing decision support. The address may be associated with the particular item. However, if the data is not a good proxy and insufficient then the data mining can give a false result which leads to the gross misinterpretation. In order to mine, many companies and organizations first have to integrate, extract, transform and cleanse their data, which has to ensure the accuracy and integrating the data gathered from different entry points in both the time intensive and costly endeavor. It problem is with the old companies with legacy systems from different parts of the business that have to be made to communicate with one another. The main motivation of this paper is to develop an algorithm for the organization to change their mode of operations and maintain the efforts. The paper ensures a tightrope between the personalization and respect for privacy. Moreover the algorithms provide for the companies to access the data in an intelligible form and it can raise the very privacy concerns which are designed to appease.


The Data computations can be performed by cubical and generalization methods. Mainly the retrieval is performed by the large set of task relevant data in the database from a relatively low conceptual level to the higher conceptual levels. The users has to ease and flexible of having large amount of data cubes which are summarized in concise and succinct terms at different levels of granularity and from different angles. The data description helps to provide an overall picture of the data at hand. The ware housing and the online analytical processing OLAP performs the cubical and generalization methods by summarizing data at varying levels of abstraction. The cubical mining which analyses the data in order to construct one, set of models to attempt and to predict the behavior of new data sets. The mining operations such as classification, regression, trend analysis can be entertained. The data cube computational model reduces the response time and enhance the performance of the OLAP. These computations are a challenging mechanism to acquire the time, storage space and multidimensional data. The data cubes are handy and readily available for query processing. The data cubes are mentioned by a lattice of cuboids and is computed for each combinations of the three dimensions. The most generalized cuboid is the apex cuboid which contains a value and aggregates the measures for all of the tuples stored in the base cuboid. To retrieve the data from the cube the algorithm moves from the apex cuboid downwards in the lattice.

state-of-the-art, endeavor, cuboids, lattice, regression, computations, cardinalities, tuples, sparse,

Keywordsfragmentation, cells,

Special Issue I

Page 22 of 55

ISSN 2229 5208

International Journal of Computer Information Systems, Vol. 3, No. 3, 2011 II. LITERATURE SURVEY computation results may lead to the reduction of expensive disk I/O operations. The computational job is performed as many as cuboids as possible to amortize the disk reading operations. The algorithm for the cubical computational operations Algorithm: Inputs R: relational table Min_support: Minimum support threshold T: cuboid Tree C: non null child Cnode: current node Output: computed value Begin 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. Procedure for DUCK( T, cnode) For each non null child C Insert the aggregate of cnode to the corresponding position. If (cnode min_support) then { If (cnode root ) then Output cnode.count If (cnode is the leaf node) then Output cnode.count. Else { Create C as a child of T cuboid tree T.roots count = cnode.count } } If (cnode is not a leaf) then Cnode.first_child If (cnode is not null) then Remove c from Ts cuboid tree If (cnode has sibling) then Cnode.sibling Remove T } End. The DUCK is an algorithm which acts like an umpiring, in sense to ensure the fast on line analytical processing. The cells of all the cuboids for given data cube is exponential to the number of dimensions. The data cubes which contains of n dimension will have the 2n Cuboids. There can be more cuboids depends on the cardinality of the dimensions. Generally the pre-computation mechanism is performed by the full cube which requires the huge and often excessive amounts of memory. The individual cuboids may be stored in the secondary storages and accessed when ever is required. The same algorithm is used for smaller cubes by umpiring which should be accessed first. The cubes consisting of the subsets for a given set of dimensions. It ranges from, the possible values to some of the dimensions. In many cases the smaller cubes acts as full cube for the given subset of dimensions and its values. Understanding the full cube computational methods develop the efficient computational partial cubes. The partial materialization of data cubes is an interesting trade off between the storage mechanism and the response time for OLAP. Instead of computing the full cubes we can estimate only the subset of the cubes cuboids and sub cubes which consist of subsets of cells from various cuboids. Generally many cells in the cuboid are may not be interested by the analyst. When the product of the cardinalities for particular dimensions in a cuboid is very large relative to the number of the non zero valued tuples which are stored in the cuboid then these can be considered as sparse. III. DATA ACCESS BY CUBICAL KITS

In general there are two methods of storing data in cuboids. The relational tables are used as a basic data structure for the implementation of relational OLAP (ROLAP) and multidimensional array are used for multidimensional OLAP (MOLAP). Both the ROLAP and MOLAP explore the different cubic computational techniques with optimization mechanisms. These computations are shared with different data retrieval operation and representations Definition: A Database is called complete, when the set of cubes are in ordered pairs defining a binary relation. The binary relation is a relation which is convenient to express the fact and a particular ordered pair (x,y) R where R is a relation containing the x R y. The relations are >, <, =, != . More precisely the relation > is Greaten than is > = {<x, y> | x, y are set of pairs and x > y} To access the data from the database the methods entitled is sorting, hashing and grouping. In cubical computations the aggregation methods are performed on the cells which share the same set dimensional values. The cubical mechanism is very important to perform the sorting , hashing and grouping operations to access and to group the data together to facilitate computations. In precisely the cube computation is an efficient mechanism to compute high level aggregates from the past lower level aggregates. The simultaneous cached intermediate



The implementation is performed by the procedure to identify the next level of data cubes of high dimensionalities which needs massive storages and unrealistic computational time. Iceberg cubes provides the more feasible alternatives for computations by providing the subset of the full cube cells. The DUCK algorithm is smaller and requires the less computation time than its corresponding full cube. The algorithm can be facilitated to compute only small portion in the data base and is extended to the rest.

Special Issue I

Page 23 of 55

ISSN 2229 5208

International Journal of Computer Information Systems, Vol. 3, No. 3, 2011 Table 1 cuboid AB

tupl es

1 2 3 4 5 out put

a a b c x 2

b a c f

c b d g

d b d h y 1

e c fre e h y 1

Ite m ava ilab ilit y 5 5 4 5 5 24

Occurre nces in the data base

V . CONCLUSION AND FUTURE WORK The DUCK algorithm is designed for exploitation in sequential manner of the data samples in cubical form. The cubical data is represented in multidimensional ways so that the data can be viewed and the patterns are matched for the easy retrieval methods. It is similar to an exploitation, which knows what they want before they set out to retrieve. The algorithm can be entertained for multidimensional analysis and can view the data as a virtual data cube consisting of one of the measures of dimensions. It also includes the time dimension and few more dimensions of locations, user category and so on. The paper is not concerned with the materialize the data cube because the materialization requires huge amount of data to be computed and stored. In future some efficient methods are requested to develop for systematic analysis of such data. The 3G and 4G technology which are using in the mobile technology can be introduced for the algorithm to initiate the analysis of the data. In future the authors are requested to extend their support for 3G and 4G technology.

1,1,1,1, 1 2,2,1 1,1,2 1,1,1,2 3,2 24

x x 1 1

The above table contains five dimensions into fragments of a, b, c, d, e. for each fragment compute the full local data cube by intersecting the tuples. In computation is performed by top down and depth first search methods. The computation of the cuboids AB, by intersection of all the pair of combinations.Generally, the combination of the required data cube is selected and those can be retrieved. The implementation of the DUCK is performed mainly by the intersection and with the compatibility of the cuboid cells. The data cells in the data base can be viewed in the form of chart1 Chart1

100 90 80 70 60 50 40 30 20 10 0 0 2 4 6

REFERENCES [1]. http// for study materials. [2]. Modern data warehousing, mining and visualization by George M.Marakas pp 42- 45 [3]. Discrete Mathematical structure by J.P. Trembley and R.Manohar Publisher by McGraw hill international editions pp 170-194

Special Issue I

Page 24 of 55

ISSN 2229 5208

International Journal of Computer Information Systems, Vol. 3, No. 3, 2011 [4]. Data Mining by Margaret H.Dunham.Introductory and Advanced Topics Margaret H. Dunham.2002 Publisher: Prentice Hall pp 110-126 [5]. Data Mining Concepts and Techniques second edition by Jiawei Han and Micheline Kamber pp 394410 [6]. 6 Jul 2007 ... Download Free eBook: Data Mining: Concepts and Techniques, [7]. J. Han, M. Kamber, and A. K. H. Tung. Geographic Data Mining and Knowledge Discovery, chapter Spatial Clustering Methods in Data Mining: A Survey, pages 15-29. Taylor and Francis, 2001. [8]. Data Mining: Concepts and Techniques, 2 Edition ... August 26, 2010. [9]. Content based image retrieval with color space and texture features proceeding of the 2009 an international conference on web information systems and mining. [10]. Indexing and Mining One Billion Time Series icdm, pp.58-67, 2010 IEEE International Conference on Data Mining, 2010 Mr. M.Laxmaiah, is a Research Scholar in JNTU Hyderabad and currently working as Professor in CSE Depart in Tirumala Engineering College, Bogarm(v) Keesara(M), Hyderabad,AP,INDIA. He has 15 years of experience in Education and 4 Years of experience in Research field.

[11]. In Proceedings of the 12th ACM SIGKDD in the International conference on Knowledge discovery and data mining (KDD '06). ACM, New York, NY, USA, 748-753.
[12]. M.J.Zaki Efficient Enumeration of frequent sequences. In proc 7th int. conf information and knowledge Management 98 pages 68-75 washing ton DC, nov 1998. [13]. Fabio Aioli, Ricardo Cardin, Fabrizio Sebastiani, Alessandro Sperduti,Preferential Text Classification: Learning Algorithms and evaluation measures, Springer Inf Retrieval 2009.

K.VenuMadhav .He is having 13 years of teaching experience and presently working as Senior Lecturer in the Department of Applied Information Systems in the University of Johannesburg, South Africa.

Prof.P.V.SarathChand B.E(C.S.E),M.Tech(CS)an d pursuing PhD(CSE).He is having 14 years of Teaching Experience. He is working as a Professor and Head of I.T Dept for Indur Institute of Engineering and Technology,Siddipet, Medak, Dist. Rajalakshmi Selvaraj She is currently working as Senior Lecturer in Botho College, Gaborone, Botswana. Rajalakshmi is Pursuing Ph.D from Magadh University, India. She has 8 years teaching experience and her interested subjects are Data mining and ware housing

D. Parvateeswara Rao MCA and received M.Tech (CSE). He is currently pursuing PhD degree in CSE and having 13 years of Teaching Experience. His area of interest is in Text Mining and Information Retrieval. He is working as an Assistant Professor in the Department of I.T in Indur Institute of Engineering and Technology, Siddipet, A.P, India

Special Issue I

Page 25 of 55

ISSN 2229 5208

You might also like