Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

1.

Introduction

The Database Management System (DBMS) web application provides a userfriendly interface for
interacting with MySQL databases. It enables users to connect to databases, execute SQL queries,
explore data, visualize data through various charts, and support uncertain predicates using
approximate string matching. This document outlines the implementation details, functionalities,
installation instructions, and usage guidelines for the DBMS web application.

2. Technologies Used

Python: Core programming language for backend logic.

Streamlit: Framework for building interactive web applications.

MySQL: Relational database management system for data storage.

Pandas, Matplotlib, Seaborn: Libraries for data manipulation and visualization.

scikitlearn, Fuzzywuzzy, Levenshtein: Libraries for machine learning and approximate string
matching.

3. Implementation Overview

Components:

1. Main Application File (app.py):

Implements the core logic and user interface using Streamlit.

Divides functionality into separate pages such as Connect, DBMS, Data Exploration, etc.

2. Connect Page:

Establishes a connection to a MySQL database using userprovided credentials.

Validates inputs and displays connection status.

3. DBMS Page:

Allows users to execute SQL queries and displays results in a table format.
Handles error cases and ensures secure query execution.
4. Data Exploration Page:

Provides summary statistics, correlation matrices, and distribution plots for selected data.

Uses Pandas and Matplotlib for data analysis and visualization.

5. Data Visualization Page:

Offers various chart types (bar plots, line plots, pie charts) for visualizing data.

Users can select columns and customize chart parameters.

6. Supporting Uncertain Predicates Page:

Implements approximate string matching for uncertain predicates.

Uses Fuzzywuzzy and Levenshtein libraries to find similar strings in the database.

Installation and Setup:

1. Clone/download the repository.

2. Install dependencies using pip install r requirements.txt.

3. Ensure MySQL server is installed and running.

4. Run the application using streamlit run app.py.

4. Usage Guidelines

Running the Application:

1. Open a terminal and navigate to the project directory.

2. Run the application using streamlit run app.py.

3. The application will open in your default web browser.

Interacting with the Application:

Navigate between pages using the sidebar.


Connect to a database by providing credentials in the Connect page.

Execute SQL queries and view results in the DBMS page.

Explore data and visualize it through different charts in the Data Exploration and Data Visualization
pages.

Utilize approximate string matching for uncertain predicates in the Supporting Uncertain Predicates
page.

Supporting Uncertain Predicates Page Implementation

Functionality Overview:

The Supporting Uncertain Predicates page implements approximate string matching techniques to
handle uncertain predicates efficiently. Users can specify a search term and a threshold, and the
system will retrieve similar strings from the database based on the provided criteria. The
implementation involves utilizing the Fuzzywuzzy library for fuzzy string matching and the
Levenshtein distance algorithm for measuring the similarity between strings.

Implementation Details:

1. User Interface:

Provides input fields for specifying the table name, column name, search term, and threshold.

Users input the required parameters and click a button to initiate the search.

2. Approximate String Matching Function:

Implements a function approximate_string_matching that takes the database connection, table


name, column name, search term, and threshold as input parameters.

Executes a SQL query to fetch data from the specified table.

Iterates through the results and calculates the fuzzy ratio and Levenshtein distance between the
search term and each value in the specified column.

Filters results based on the threshold and returns matched records along with their similarity
scores.

3. Displaying Results:

Displays the matched results in a table format, including the original values, fuzzy ratio,
Levenshtein distance, and uncertain prediction probability.
Enables users to visualize and analyze the matched records conveniently.

Code Snippet:

python

import mysql.connector

from fuzzywuzzy import fuzz

import Levenshtein

def approximate_string_matching(connection, table, column, search_term, threshold):

try:

cursor = connection.cursor(dictionary=True)

query = f"SELECT FROM {table}"

cursor.execute(query)

results = cursor.fetchall()

matched_results = []

for result in results:

fuzzy_ratio = fuzz.ratio(search_term.lower(), result[column].lower())

levenshtein_distance = Levenshtein.distance(search_term.lower(), result[column].lower())

uncertain_probability = 1 (levenshtein_distance / max(len(search_term), len(result[column])))


if max(len(search_term), len(result[column])) != 0 else 0

if fuzzy_ratio >= threshold:

result["Fuzzy Ratio"] = fuzzy_ratio

result["Levenshtein Distance"] = levenshtein_distance

result["Uncertain Prediction Probability"] = uncertain_probability

matched_results.append(result)

return matched_results
except mysql.connector.Error as e:

print("Error while querying MySQL:", e)

return []

Usage Example:

python

Example usage of approximate_string_matching function

connection = mysql.connector.connect(host="localhost", user="username", passwd="password",


database="database_name")

matched_results = approximate_string_matching(connection, "table_name", "column_name",


"search_term", 80)

print(matched_results)

Math:

1. Fuzzy Ratio:

The fuzzy ratio, also known as the fuzzy similarity ratio, measures the similarity between two strings
by comparing their characters and positions. It is calculated using the formula:

\[ \text{Fuzzy Ratio} = \frac{{2 \times \text{matches}}}{{\text{total characters in both strings}}} \times


100 \]

Where:

- \(\text{matches}\) is the number of matching characters between the two strings.

- The total characters in both strings include the sum of characters in both strings.

For example, consider two strings: "apple" and "aple". The fuzzy ratio would be calculated as follows:

\[ \text{Fuzzy Ratio} = \frac{{4}}{{5 + 4}} \times 100 \approx 44.44 \]

2. Levenshtein Distance:
The Levenshtein distance, also known as the edit distance, calculates the minimum number of single-
character edits (insertions, deletions, or substitutions) required to change one string into another. It
is computed using dynamic programming and is defined as follows:

\[ \text{Levenshtein Distance}(\text{string1}, \text{string2}) = \]

\[ \begin{cases}

\text{len}(\text{string2}) & \text{if len}(\text{string1}) = 0 \\

\text{len}(\text{string1}) & \text{if len}(\text{string2}) = 0 \\

\text{Levenshtein Distance}(\text{string1}[1:], \text{string2}[1:]) & \text{if string1}[0] =


\text{string2}[0] \\

1 + \min(\text{Levenshtein Distance}(\text{string1}[1:], \text{string2}), \\

\text{Levenshtein Distance}(\text{string1}, \text{string2}[1:]), \\

\text{Levenshtein Distance}(\text{string1}[1:], \text{string2}[1:])) & \text{otherwise}

\end{cases} \]

Where:

- \(\text{string1}\) and \(\text{string2}\) are the input strings.

- \(\text{len}(\text{string})\) returns the length of the string.

- \(\text{string}[i]\) returns the character at position \(i\) in the string.

For example, consider the strings "kitten" and "sitting". The Levenshtein distance between these
strings is 3.

Development Documentation: Supporting Uncertain Predicates in DBMS Using Approximate String


Matching and Probabilistic Databases

1. Introduction

The development of the Supporting Uncertain Predicates in DBMS project involved implementing
functionalities to handle uncertain predicates efficiently using approximate string matching
techniques and exploring the integration of probabilistic databases. This document provides an
overview of the development process, including key components, implementation details, and the
technologies used.
2. Project Structure

The project is organized into the following main components:

1. **Main Application Logic (`app.py`):**

- Implements the core logic and user interface using the Streamlit framework.

- Divides functionality into separate pages such as Connect, DBMS, Data Exploration, Supporting
Uncertain Predicates, Visualization, and Database Schema.

2. **Database Interaction Functions:**

- Functions to establish connections to MySQL databases and execute SQL queries.

- Error handling mechanisms to ensure secure database operations.

3. **Approximate String Matching Functionality:**

- Implements fuzzy string matching algorithms using the Fuzzywuzzy library and Levenshtein
distance calculation.

- Provides a mechanism to search for similar strings in a database column based on user-provided
search terms and thresholds.

3. Implementation Details

Supporting Uncertain Predicates Page:

- **User Interface:**

- Includes input fields for specifying the table name, column name, search term, and threshold.

- Utilizes Streamlit widgets to capture user inputs and trigger search actions.

- **Approximate String Matching Function (`approximate_string_matching`):**

- Connects to the database and fetches data from the specified table.

- Calculates fuzzy ratios and Levenshtein distances between the search term and column values.

- Filters results based on the specified threshold and returns matched records along with similarity
scores.

- **Mathematical Representations:**
- Provides mathematical explanations of fuzzy ratio and Levenshtein distance algorithms.

- Illustrates the calculation process and significance of these metrics in quantifying string similarity.

4. Technologies Used

- **Python:** Core language for implementing project logic and functionalities.

- **Streamlit:** Web application framework for building interactive user interfaces.

- **MySQL:** Relational database management system for storing and managing data.

- **Pandas, Matplotlib, Seaborn:** Libraries for data manipulation and visualization.

- **scikit-learn, Fuzzywuzzy, Levenshtein:** Libraries for machine learning and approximate string
matching.

5. Installation and Setup

1. Clone/download the project repository.

2. Install dependencies using `pip install -r requirements.txt`.

3. Ensure MySQL server is installed and running.

4. Run the application using `streamlit run app.py`.

5. Access the application in a web browser and navigate to the Supporting Uncertain Predicates page.

6. Conclusion

The Supporting Uncertain Predicates in DBMS project aims to enhance traditional database systems
by providing efficient support for uncertain predicates. By implementing approximate string
matching techniques and exploring the integration of probabilistic databases, the project offers a
robust solution for handling uncertain data scenarios. This development documentation serves as a
guide for understanding the project structure, implementation details, and usage instructions.

You might also like