Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

DB-101 DATA MANAGEMENT

Fall 2008
Data Structures Project
GENERAL PROJECT INFORMATION

Course Name

DB-101 Data Management

Project Name

Personal Data Store (PDS)

Project Description

PDS is a data store for personal use. PDS supports all the basic
operations needed for managing persistent data.

Project Type

Individual

Due Date

17-Oct-2008, 10 AM

PROJECT COMPONENTS

Data Files
Data files used by PDS are binary files containing multiple data objects where each data
object has a fixed size. Each data object may contain many fields with some
restrictions as given below. The project uses two data files corresponding to different
two data object types as explained below. The two data object types are related to
each other such that every base_object is related to exactly one related_object. For
example, employee is an example of a base_object and department is an example of a
related_object. In this example, every employee object is related to exactly one
department object.
Data file name is obtained by concatenating .dat extension to the data object name. For
example, if data object name is employee, then the data file name will be
employee.dat.

file_1.dat
This data file contains data corresponding first data object type. Each data object in this
file contains any number of data fields with the restriction that the first two fields should
always be key and the related key. The general structure is as shown below:
unsigned int key;
unsigned int related_key;
some_data_type field_3
some_data_type field_4

some_data_type field_5
some_data_type field_6
In this structure, key represents a unique identifier of the data object and related_key
refers to the unique identifier of another data object with which this data object is
related.
This structure of the file with the data is pictorially shown below:
Key
related_key
field_3
field_4
field_5
(unsigned int)
(unsigned int)
Key
related_key
field_3
field_4
field_5
(unsigned int)
(unsigned int)
Key
related_key
field_3
field_4
field_5
(unsigned int)
(unsigned int)
Key
related_key
field_3
field_4
field_5
(unsigned int)
(unsigned int)
Key
related_key
field_3
field_4
field_5
(unsigned int)
(unsigned int)

field_6
field_6
field_6
field_6
field_6

An example structure of a data object is shown below:


unsigned int emp_no;
unsigned int dept_id;
char name[25];
int age;

file_2.dat
This data file contains data corresponding to the second data object type with which
every object of the first data object type is related. This data object in this file contains
any number of data fields with the restriction that the first field should always be the key.
The general structure is as shown below:
unsigned int key;
some_data_type field_2
some_data_type field_3
some_data_type field_4
some_data_type field_5
some_data_type field_6
In this structure, key represents a unique identifier of the data object and related_key
refers to the unique identifier of another data object with which this data object is
related.

This structure of the file with the data is pictorially shown below:
Key
(unsigned int)
Key
(unsigned int)
Key
(unsigned int)
Key
(unsigned int)
Key
(unsigned int)

field_2

field_3

field_4

field_5

field_6

field_2

field_3

field_4

field_5

field_6

field_2

field_3

field_4

field_5

field_6

field_2
(unsigned int)
related_key
(unsigned int)

field_3

field_4

field_5

field_6

field_3

field_4

field_5

field_6

An example structure of a data object is shown below:


unsigned int dept_id;
char dept_name[25];
You are required to define two of your own data object structures similar to the above
based on the application domain given to you.
Note that fixed size data objects should not contain any pointers.

Index files
Index files are binary files that contain a set of data objects where each data object
contains information regarding the position of data objects inside the data file based on
the key value. The index file name is obtained by concatenating .ndx extension to the
data_object_name. For example, if data object name is employee, then the data file
name will be employee.dat and the index file name will be employee.ndx.
The structure of the index file is fixed and contains the following information:
/* Array of positions indicating free positions implemented as queue */
int free_list[100]
/* Index information */
unsigned int key
/*primary key of table */
unsigned int offset
/*offset of data object in a file*/
unsigned char flag
/* to mark deleted data object */
The above structure is pictorially shown below:

Key (unsigned int)


Key (unsigned int)

Free_list (100 integers)


Offset (unsigned int)
Offset (unsigned int)

flag (unsigned char)


flag (unsigned char)

Key (unsigned int)


Key (unsigned int)
Key (unsigned int)
Key (unsigned int)

Offset (unsigned int)


Offset (unsigned int)
Offset (unsigned int)
Offset (unsigned int)

flag (unsigned char)


flag (unsigned char)
flag (unsigned char)
flag (unsigned char)

The PDS project contains two index files corresponding to the two data files

CREATE OPERATIONS
int loadDataStore ( char *base_object_name, char *related_object_name )
Return 0 for success, 1 for failure
Load the index file corresponding to the two data object names into memory. The free
list array is loaded into a queue data structure and the actual index information is loaded
into a binary search tree or a hash map.
Algorithm
Repeat the following steps for each data object index file:
Step 0: Use a global data structure to hold the index structure (BST and Hash Table)
for the two data object index files. Example:
Step:1 Load the free list array data (first 400 bytes of the corresponding index
file) into integer array of size 100 into the global data structure. (Hint: use fread
to read the entire array in one call).
Step:2 Load structure data of an index file starting past free list array one by
one and add it to the Binary Search Tree or Hash Table on that data.
Step: 3 If operation not successful return 1 else return 0.

int addUniqueDataObject(char* data_object_name, void *dataObject)


Return 0 for success, 1 for failure
Inserts a data object into the given data file. The position to insert the data object is
determined from the free_list queue by dequeue operation. If queue is empty, insert to
the end.
Constraints
Programmer should take care of
Not-null and Uniqueness for the key

Algorithm
/* To do */
Step: 1 If data_object_name = child file name
Then
goto Step: 2
Else
goto Step: 5.

Step: 2 Check referential integrity constraint.


If valid
Then
go to Step:3
Else
go to Step : 5
Step: 3 If front = rear which indicates queue is empty
Then
Append the dataObject to data_object_name file and also update
corresponding index file.
Else
Perform dequeue on free_list_array and get record offset and insert
dataObject into the corresponding file and also update corresponding
index file.
Step: 4 Go to Step: 6.
Step: 5 Perform Step 3 and 4 for the given file (parent file).
Step: 6 If operation not successful return 1 else return 0

UPDATE OPERATIONS
Modifies an existing data object accessed using the index file based on the given key.
This method essentially seeks to that position in the file using the offset defined in the
index file, and overwrites the whole record with the passed values.
Prototype

int updateDataObject(char* data_object_name, int key, void


*updatedDataObject);
Return 0 for success, 1 value for failure

Constraints
Key value cannot be changed.
Validate records that exceed the fixed length.

Algorithm
/* To do */
Step :1 First find offset of a key from BST data structure or Hash Table.
Step :2 Now update the record by updateDataObject with corresponding key
value in given data_object_name file.(user must take care of fixed length of
record).
Step :3 If operation not successful return 1 else return 0.

RETRIEVE
Fetches a set of data objects based on the search criteria given.
Care should be taken if a data object retrieved is marked for deletion in the index table.
If so, the data object should not be accessed from the file.
Prototype

int getByKey(char* dataObjectName, int key, void *result)


Algorithm
/* To do */
Step: 1 Identify the parent / child table name from dataObjectName and load the
corresponding parent/child index file.
Step: 2 For the given key, fetch the data offset from the index file.
Step: 3 Seek to the offset value from the data file and read the record.
Step : 4 Store the fetched record in *result and return the same.

Step : 5 Return 'Null' if the query is invalid or if 'no records found'.

int getRelatedObject( char *baseObjectName, char *relatedObjectName, int


key, void *result) //key should be of child table
Algorithm
/* To do */
Step: 1 Define a structure for the record to be returned which combines fields from
both the parent and child tables, that is to be returned.
Step: 2 Check if the key belongs to that of the child table , ' * baseObjectName'
corresponds to that of the parent table and ' * relatedObjectName ' corresponds to the
child table.
Step: 3 For the given key, fetch the data offset from the index file for child table.
Step: 4 Seek to the offset value from the data file and read the record from the child
table data file.
Step: 5 Take the value of the primary key of this record as a key for the parent index file
and fetch the offset.
Step: 6 Seek to the offset value from the parent table data file and read the record.
Step: 7 Append this to the record fetched in step 4 and store in the pointer * result.
Step: 8 Return 'Null' if the query is invalid or if 'no records found'.

int getRelatedObjectMany(char *baseObjectName, char


*relatedObjectName, int key, void **result) //key should be of parent table
Algorithm
/* To do */
Step: 1 Define a structure for the record to be returned which combines fields from both
the parent and child tables, with an array of structure variables since this function
queries a one-many relation, that is to be returned.
Step: 2 Check if the key belongs to that of the parent table , ' * baseObjectName'
corresponds to that of the parent table and ' * relatedObjectName ' corresponds to the
child table.
Step: 3 For the given key, fetch the data offset from the index file for parent table.

Step: 4 Seek to the offset value from the data file and read the record from the child
table data file.
Step: 5 Take the value of the foreign key of this record as a key for the child index file
and fetch the offset from the index file.
Step: 6 Seek to the offset value from the child table data file and read all the matching
records.
Step: 7 Append these records to the record fetched in step 4 and store in the pointer **
result.
Step: 8 Return 'Null' if the query is invalid or if 'no records found'.
Constraints
Retrieve all keys
Conditional retrieve is available only on primary key or foreign key.
If a data object is found, then function should return a pointer to link list structure
(struct *)
If an invalid query is passed, return NULL //if no match found return NULL.
Common naming conventions should be followed for the field names and table
names based on the domain specified
/* How to differentiate between system related error (e.g file can not be
open) and normal failure operation (e.g. key not found)*/

DELETE
Marks a data object for deletion in the index file. Need not remove the data object from
the data file immediately.
Prototype

int deleteByKey(char* data_object_name, int key)


Return 0 for success, 1 value for failure
Step: 1 First find offset of a key from BST or Hash Table data structure on
corresponding index file of data_object_name.
Step: 2 If data_object_name = child file
Then
If key found
Then
Set flag = 1 in the corresponding index file. And also update

BST or Hash Table and also enqueue that particular key in free list
array.
Else
Go To Step: 3
Else
Call deleteCascadeByKey (char *baseObjectName, char
*relatedObjectName, int baseKey)
Step: 3 If operation not successful return 1 else return 0

int deleteCascadeByKey(char *baseObjectName, char *relatedObjectName,


int baseKey)
Return 0 for success, 1 value for failure
// This function will be called by deleteByKey function if one try to delete data object
from parent file
Constraints
On delete using foreign key, identify if primary key exists or not.
On delete using primary key, Cascade delete.
Algorithm
/* To do */
Step: 1 First find offset of a baseKey from BST or Hash Table data structure on
corresponding index file of baseObjectName.
Step: 2 If baseKey found
Then
Set flag = 1 in the corresponding index file of baseObjectName..And also
update BST or Hash Table and also enqueue that particular basekey in
free list array.
Go To step 3.
Else
Go To Step: 5
Step: 3 Search the child data file and collect the primary keys whose foreign key is
baseKey.
Step: 4 Set flag = 1for these set of keys of index file of child data file of
baseObjectName file. And also update BST or Hash Table and also enqueue
that particular basekey in free list array of that index file.
Step: 5 If Operation not successful return 1 else return 0.

HashTable* allKeysHashTable(char * indexFileName)


Given a file retrieve all keys of the index file. A hash table of all keys should be formed.
Algorithm
/* To do */
Step: 1 Read the index file by appending .ndx extension to the passed indexFileName
argument
Step: 2 Fetch the primary keys from the file. Primary keys are the first field in the index
file records
Step: 3 Create a hash table with key as primary keys and values as the offset specified
in the second field

Step: 4 Only keys of records which are valid must be fetched while creating the hash
map

BST* allKeysHashTable(char * indexFileName)


Given a file retrieve all keys of the index file. A binary search tree of all keys should be
Algorithm
/* To do */
Step: 1 Read the index file by appending .ndx extension to the passed indexFileName
Argument
Step: 2 Fetch the primary keys from the file. Primary keys are the first field in the index
file records
Step: 3 Create a binary search tree with the fetched keys.
Step: 4 Only keys of records which are valid must be fetched while creating the tree

int unloadDataStore( char *dataObjectName )


Return 0 for success, 1 for failure
Compacts the data object file based on deletion information found in the index data and
saves the compacted data file back to the data store. Updated index data is saved back
to the corresponding index file.
Algorithm
/* To do */
Step: 1 Read the index file of the data object to find whether a record is valid or not

Step: 2 Create a new file copying valid records into it.


Step: 3 Delete the old .dat file
Step: 4 Rename the newly created file as dataObject Name.dat

You might also like