Professional Documents
Culture Documents
Uncertain
Uncertain
Introduction
The Database Management System (DBMS) web application provides a userfriendly interface for
interacting with MySQL databases. It enables users to connect to databases, execute SQL queries,
explore data, visualize data through various charts, and support uncertain predicates using
approximate string matching. This document outlines the implementation details, functionalities,
installation instructions, and usage guidelines for the DBMS web application.
2. Technologies Used
scikitlearn, Fuzzywuzzy, Levenshtein: Libraries for machine learning and approximate string
matching.
3. Implementation Overview
Components:
Divides functionality into separate pages such as Connect, DBMS, Data Exploration, etc.
2. Connect Page:
3. DBMS Page:
Allows users to execute SQL queries and displays results in a table format.
Handles error cases and ensures secure query execution.
4. Data Exploration Page:
Provides summary statistics, correlation matrices, and distribution plots for selected data.
Offers various chart types (bar plots, line plots, pie charts) for visualizing data.
Uses Fuzzywuzzy and Levenshtein libraries to find similar strings in the database.
4. Usage Guidelines
Explore data and visualize it through different charts in the Data Exploration and Data Visualization
pages.
Utilize approximate string matching for uncertain predicates in the Supporting Uncertain Predicates
page.
Functionality Overview:
The Supporting Uncertain Predicates page implements approximate string matching techniques to
handle uncertain predicates efficiently. Users can specify a search term and a threshold, and the
system will retrieve similar strings from the database based on the provided criteria. The
implementation involves utilizing the Fuzzywuzzy library for fuzzy string matching and the
Levenshtein distance algorithm for measuring the similarity between strings.
Implementation Details:
1. User Interface:
Provides input fields for specifying the table name, column name, search term, and threshold.
Users input the required parameters and click a button to initiate the search.
Iterates through the results and calculates the fuzzy ratio and Levenshtein distance between the
search term and each value in the specified column.
Filters results based on the threshold and returns matched records along with their similarity
scores.
3. Displaying Results:
Displays the matched results in a table format, including the original values, fuzzy ratio,
Levenshtein distance, and uncertain prediction probability.
Enables users to visualize and analyze the matched records conveniently.
Code Snippet:
python
import mysql.connector
import Levenshtein
try:
cursor = connection.cursor(dictionary=True)
cursor.execute(query)
results = cursor.fetchall()
matched_results = []
matched_results.append(result)
return matched_results
except mysql.connector.Error as e:
return []
Usage Example:
python
print(matched_results)
Math:
1. Fuzzy Ratio:
The fuzzy ratio, also known as the fuzzy similarity ratio, measures the similarity between two strings
by comparing their characters and positions. It is calculated using the formula:
Where:
- The total characters in both strings include the sum of characters in both strings.
For example, consider two strings: "apple" and "aple". The fuzzy ratio would be calculated as follows:
2. Levenshtein Distance:
The Levenshtein distance, also known as the edit distance, calculates the minimum number of single-
character edits (insertions, deletions, or substitutions) required to change one string into another. It
is computed using dynamic programming and is defined as follows:
\[ \begin{cases}
\end{cases} \]
Where:
For example, consider the strings "kitten" and "sitting". The Levenshtein distance between these
strings is 3.
1. Introduction
The development of the Supporting Uncertain Predicates in DBMS project involved implementing
functionalities to handle uncertain predicates efficiently using approximate string matching
techniques and exploring the integration of probabilistic databases. This document provides an
overview of the development process, including key components, implementation details, and the
technologies used.
2. Project Structure
- Implements the core logic and user interface using the Streamlit framework.
- Divides functionality into separate pages such as Connect, DBMS, Data Exploration, Supporting
Uncertain Predicates, Visualization, and Database Schema.
- Implements fuzzy string matching algorithms using the Fuzzywuzzy library and Levenshtein
distance calculation.
- Provides a mechanism to search for similar strings in a database column based on user-provided
search terms and thresholds.
3. Implementation Details
- **User Interface:**
- Includes input fields for specifying the table name, column name, search term, and threshold.
- Utilizes Streamlit widgets to capture user inputs and trigger search actions.
- Connects to the database and fetches data from the specified table.
- Calculates fuzzy ratios and Levenshtein distances between the search term and column values.
- Filters results based on the specified threshold and returns matched records along with similarity
scores.
- **Mathematical Representations:**
- Provides mathematical explanations of fuzzy ratio and Levenshtein distance algorithms.
- Illustrates the calculation process and significance of these metrics in quantifying string similarity.
4. Technologies Used
- **MySQL:** Relational database management system for storing and managing data.
- **scikit-learn, Fuzzywuzzy, Levenshtein:** Libraries for machine learning and approximate string
matching.
5. Access the application in a web browser and navigate to the Supporting Uncertain Predicates page.
6. Conclusion
The Supporting Uncertain Predicates in DBMS project aims to enhance traditional database systems
by providing efficient support for uncertain predicates. By implementing approximate string
matching techniques and exploring the integration of probabilistic databases, the project offers a
robust solution for handling uncertain data scenarios. This development documentation serves as a
guide for understanding the project structure, implementation details, and usage instructions.