Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Information Retrieval

Assignment 3 Proximal Node

Session: 2020 – 2024

Submitted by:
Saqlain Nawaz 2020-CS-135

Supervised by:
Sir Khaldoon Syed Khurshid

Department of Computer Science


University of Engineering and Technology
Lahore Pakistan
Overview
This code is designed to create a graph-based model for a collection of text files. It
includes a search function that allows users to query the model based on proximal
nodes.

Libraries Used
The following libraries are used in this code:

● os: Provides functions for interacting with the operating system, used here for
file operations and directory traversal.
● networkx: A Python package used for the creation, manipulation, and study of
the structure, dynamics, and functions of complex networks.

Code Flow
The code is structured as follows:

Import Libraries
The required libraries are imported at the beginning of the code.

gather_documents Function
This function takes a directory path as input and returns a list of all text files in that
directory. It uses the os.walk function to traverse the directory and its
subdirectories.

proximal_nodes_model Function

This function takes a graph G and a list of proximal_nodes as input. It returns a


dictionary where each key is a node from proximal_nodes that exists in G, and
each value is a list of nodes in G that are connected to the key node.

Main Function
In the main function, the program first gathers all text files from a specified directory
and adds them as nodes to a graph. It also adds some additional nodes and edges
to the graph. Then it enters a loop where it allows the user to enter a query. For each
term in the query, it finds and prints all documents (nodes) that are connected to that
term in the graph.

Execution
The main function is executed when the script is run. The user can interact with the
program by entering queries, and the program will print out documents connected to
the query terms according to the graph.

Please note that this is a basic implementation of a proximal node model. It does not
take into account the proximity of terms within documents or the structure of the
documents. For a more advanced implementation, you might need to use a library
that can parse and query structured documents.

You might also like