Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 2

[DETERMINATION OF ASSOCIATION RULES July 30, 2011 FROM TRANSACTIONAL DATABASES]

Objective: Determination of data patterns from huge transactional databases has been an active area of research over the past years. This simple tool aims at determining the shopping patterns from the transactions in the database. Though there are large number of similar tools available on the internet, none of them are intuitive to use and analyze. The main aspect of mining data patterns from databases is analyzing the mined data. This tool provides Microsoft Office Excel integration for easy analysis. This tool can generate graphs and tables based on options fed. The reports generated are elegant and easy to analyze which paves way for easy business use. The analyzed data can be used in making influential business decisions. Plan: Key methodology involved will be agile and not waterfall. 1. 2. 3. 4. 5. Selection of platform and tools to be used. Verifying the correctness of Algorithms used. Design analysis involving code level details , assemblies , configuration etc. Coding implementation. Testing. 1. Correctness testing. 2. User experience testing.

As the methodology is agile, all the activities will run in parallel(virtually though). Implementation , testing and bug fixing will happen simultaneously. Design implementation: There are basically two main parts in the system. One is the core engine which will use the core data mining algorithms to compute the association rules. The other part is the report generation engine. Very small but notable part of the system is the adapter which serves the purpose of conveying data from the core engine to the report generator. The reason for keeping the report generator as a separate component in the system is that it is very generic and can be used in other systems also. The report generation will be very carefully designed not to take any form of dependency on the core engine. The reason for this is also same as stated above. The report generator will be client agnostic. But the adapter needs to written independently if this report generator is to be used in some other system. All excel related options will be read and used from an application configuration file. The below diagram gives a brief idea about the system:

[DETERMINATION OF ASSOCIATION RULES July 30, 2011 FROM TRANSACTIONAL DATABASES]

CORE ENGINE RUNNING APRIORI ALGORITHM

REPORT GENERATION ENGINE

Transactional Databases Adapter Excel interop

Excel interoperation will be achieved by .net in-built support for office. The transactional databases , for the purpose of simplicity are not real databases but some data structure that will represent a database in memory. The overall efficiency of the tool depends on the performance of apriori algorithm and the report generation algorithm. Minor tweaks in apriori algorithm, will be done for efficiency and performace. Tools and platform: The tool will be as such developed in windows 7 on .NET 4.0 but can run on any operating system that has .NET 4.0. For Linux based systems, Mono (an open source product sponsored by Microsoft) should be installed to run the tool. .NET has been chose because of the very extensive interoperability that it provides with office components over other technologies like Java etc. C # will the language used. All development will be done in Visual Studio 2010 Express Edition (free software). Unit tests will be written in C #. Microsoft office needs to be present on the machine so that reports can be obtained. List of software/runtime/operating system installed for development : Windows 7 Home basic, Visual studio 2010 Express Edition(Which will in turn install common language runtime 4.0). Microsoft office 2010.

You might also like