Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 19

1. What are the advantages of DBMS Approach in managing data?

Solution:

Advantages of DBMS Approach in managing data are a) Data Redundancy and Inconsistency Since data resides in different private data files, there are chances of redundancy and resulting inconsistency. For example, in the above example shown, the same customer can have a savings account as well as a mortgage loan. Here the customer details may be duplicated since the programs for the two functions store their corresponding data in two different data files. This gives rise to redundancy in the customers data. Since the same data is stored in two files, inconsistency arises if a change made in the data in one file is not reflected in the other. b) Unanticipated Queries In a file-based system, handling sudden/ad-hoc queries can be difficult, since it requires changes in the existing programs. c) Data Isolation Though data used by different programs in the application may be related, they reside in isolated data files. d) Concurrent Access Anomalies In large multi-user systems the same file or record may need to be accessed by multiple users simultaneously. Handling this in a file-based system is difficult. e) Security Problems In data-intensive applications, security of data is a major concern. Users should be given access only to required data and not the whole database. In a file-based system, this can be handled only by additional programming in each application. f) Integrity Problems In any application, there will be certain data integrity rules which need to be maintained. These could be in the form of certain conditions/constraints on the elements of the data records. In the savings bank application, one such integrity rule could be Customer ID, which is the unique identifier for a customer record, should be non-empty. There can be several such integrity rules. In a file-based system, all these rules need to be explicitly programmed in the application program. It may be noted that, we are not trying to say that handling the above issues like concurrent access, security, integrity problems, etc., is not possible in a file-based system. The real

issue was that, though all these are common issues of concern to any data-intensive application, each application had to handle all these problems on its own. The application programmer needs to bother not only about implementing the application business rules but also about handling these common issues.

2. Explain the following concepts with respect to storage structures in databases: Clustering


Solution:

Indexing

In clustering, if the page containing the requested record is already in the memory, retrieval from the disk is not necessary. In such a situation, time taken for the whole operation will be less. Thus, if records which are frequently used together are placed physically together, more records will be in the same page. Hence the number of pages to be retrieved will be less and this reduces the number of disk accesses which in turn gives a better performance. This method of storing logically related records, physically together is called clustering. Eg: Consider CUSTOMER table as shown below. Cust ID 10001 10002 10003 10004 Cust Name Raj Cust City Delhi

If queries retrieving Customers with consecutive Cust_IDs frequently occur in the application, clustering based on Cust_ID will help improving the performance of these queries. This can be explained as follows. Assume that the Customer record size is 128 bytes and the typical size of a page retrieved by the File Manager is 1 Kb (1024 bytes). If there is no clustering, it can be assumed that the Customer records are stored at random physical locations. In the worst-case scenario, each record may be placed in a different page. Hence a query to retrieve 100 records with consecutive Cust_Ids (say, 10001 to 10100), will require 100 pages to be accessed which in turn translates to 100 disk accesses. But, if the records are clustered, a page can contain 8 records. Hence the number of pages to be accessed for retrieving the 100 consecutive records will be ceil(100/8) = 13. i.e., only 13 disk accesses will be required to obtain the query results. Thus, in the given example, clustering improves the speed by a factor of 7.7

Intra-file Clustering Clustered records belong to the same file (table) as in the above example. Inter-file Clustering Clustered records belong to different files (tables). This type of clustering may be required to enhance the speed of queries retrieving related records from more than one table. Here interleaving of records is used.

Indexing is another common method for making retrievals faster. Consider the example of CUSTOMER table used above. The following query is based on Customers city. Retrieve the records of all customers who reside in Delhi Here a sequential search on the CUSTOMER table has to be carried out and all records with the value Delhi in the Cust_City field have to be retrieved. The time taken for this operation depends on the number of pages to be accessed. If the records are randomly stored, the page accesses depends on the volume of data. If the records are stored physically together, the number of pages depends on the size of each record also. If such queries based on Cust_City field are very frequent in the application, steps can be taken to improve the performance of these queries. Creating an Index on Cust_City is one such method. This results in the scenario as shown below.

A new index file is created. The number of records in the index file is same as that of the data file. The index file has two fields in each record. One field contains the value of the Cust_City field and the second contains a pointer to the actual data record in the CUSTOMER table. Whenever a query based on Cust_City field occurs, a search is carried out on the Index file. Here, it is to be noted that this search will be much faster than a sequential search in the CUSTOMER table, if the records are stored physically together. This is because of the much smaller size of the index record due to which each page will be able to contain more number of records. When the records with value Delhi in the Cust_City field in the index file are located, the pointer in the second field of the records can be followed to directly retrieve the corresponding CUSTOMER records. Thus the access involves a Sequential access on the index file and a Direct access on the actual data file.

Retrieval Speed v/s Update Speed : Though indexes help making retrievals faster, they slow down updates on the table since updates on the base table demand update on the index field as well. It is possible to create an index with multiple fields i.e., index on field combinations. Multiple indexes can also be created on the same table simultaneously though there may be a limit on the maximum number of indexes that can be created on a table.

3. What are the properties of a Relation? Discuss in brief


Solution:

Properties of Relations

No Duplicate Tuples A relation cannot contain two or more tuples which have the same values for all the attributes. i.e., in any relation, every row is unique. Tuples are unordered The order of rows in a relation is immaterial. Attributes are unordered The order of columns in a relation is immaterial. Attribute Values are Atomic Each tuple contains exactly one value for each attribute.

It may be noted that many of the properties of relations follow the fact that the body of a relation is a mathematical set.

4. Describe the following: Advantages of Database processing Functions of a DBMS


Solution:

Advantages of Database Processing are a) Economy of Scale Since several users are sharing the database, any improvement in the database will benefit several users, The term economy of scale refers to the fact that the collective cost of several combined operations may be less than the sum of the cost of individual operations. This type of combination is possible using database processing. b) Efficient extraction of relevant Information The primary goal of a computer system is to tune data (recorded facts) into information (knowledge gained by processing these facts). This is possible using database. c) Sharing of Data Authorized users can share the data. Several users can have access to the same piece of data ( for example, faculty members address) and use it for several purposes, when a faculty members address is changed. The change is available to all users. d) Balancing Conflicting Requirements For the database approach to function properly there must be a person or group in change of the database. This group is called database Administration (DBA). By keeping the overall requirements of the organization in mind, DBA can structure the database to the benefit of the entire organization. Thus the overall organization will benefit. e) Environment of Standards With the centralized control, DBA can ensure those standards for data names etc. is followed uniformly throughout the organization. f) Controlled Redundancy Since data, which was kept in several files, is now integrated into a single database, we no longer have multiple copies of the same data. There may be occasions when duplication of data will be necessary. DBMS helps us to control redundancy rather than eliminate it. g) Consistency Consistency follows from the control or elimination of redundancy. If the address of a faculty appears only in one place, it is not possible for a faculty to have one address in one place and another address in another place. h) Integrity

An Integrity constraint is a rule which data in the database should follow. One integrity constraint may be that the department number for a faculty must be that of a department which actually exists. A database has integrity if data in the database satisfies all integrity constraints, which have been established. i) Security Security is the prevention of access to the database by unauthorized users. Since DBA has complete control over data it can define authorization procedures to ensure that only legitimate users can have access to the data. DBA can also allow different users to have different types of access to the same data . the payroll department should be able to view and change the salary of a faculty and the Insurance department should be able to view the salary of a faculty but not change it whereas the person in charge of handling academic activities of faculty may not be even able to see the salary. One method by which DBA achieves this security is through user views. If a data item is not included in the user view for a use, then that user will not be able to have access to that data, j) Flexibility and Responsiveness Since the data, which was previously kept in several files, is now in the same database, responding to requests from different areas is possible in a much easier and more flexible manner. Suppose we want to find all faculty members who are in department2, who are covered by insurance plan3 and who have a salary below RM35000, we can write SELECT facultyumber, Name FROM FACULTY WHERE DeptNo = 2 AND PlanNo = 3 AND SALARY < 35000 k) Data Independence Data independence occurs when the structure of the database can change without requiring programs that access the database to change. Data independence is achieved through the use of external views. Each program accesses data through an external view. The underlying structure of the database can change without requiring a change in the external view. Of course, the change to the database structure should be such that a required field should not be removed from the database structure.

1. Explain Discretionary approach of Database Security?


Solution:

The typical method of enforcing discretionary access control in a database system is based on the granting and revoking of privileges. Let us consider privileges in the context of a relational DBMS. In particular, we will discuss a system of privileges somewhat similar to the one originally developed for the SQL language. Many current relational DBMSs use some variation of this technique. The main idea is to include statements in the query language that allow the DBA and selected users to grant and revoke privileges.

Types of Discretionary Privileges The concept of an authorization identifier is used to refer, to a user account (or group of user accounts). For simplicity, we will use the words user or account interchangeably in place of authorization identifier. The DBMS must provide selective access to each relation in the database based on specific accounts. Operations may also be controlled; thus, having an account does not necessarily entitle the account holder to all the functionality provided by the DBMS. Informally, there are two levels for assigning privileges to use the database system: The account level: At this level, the DBA specifies the particular privileges that each account holds independently of the relations in the database. The relation (or table) level: At this level, the DBA can control the privilege to access each individual relation or view in the database. The privileges at the account level apply to the capabilities provided to the account itself and can include the CREATE SCHEMA or CREATE TABLE privilege, to create a schema or base relation; the CREATE VIEW privilege; the ALTER privilege, to apply schema changes such as adding or removing attributes from relations; the DROP privilege, to delete relations or views; the MODIFY privilege, to insert, delete, or update tuples; and the SELECT privilege, to retrieve information from the database by using a SELECT query. Notice that these account privileges apply to the account in general. If a certain account does not have the CREATE TABLE privilege, no relations can be created from that account. The second level of privileges applies to the relation level, whether they are base relations or virtual (view) relations. The term relation may refer either to a base relation or to a view, unless we explicitly specify one or the other. Privileges at the relation level specify for each user the individual relations on which each type of command can be applied. Some privileges also refer to individual columns (attributes) of relations. SQL commands provide privileges at the relation and attribute level only. Although this is quite general, it makes it difficult to create accounts with limited privileges. The granting and revoking of privileges generally follow an authorization model for discretionary privileges known as the access matrix model, where the rows of a matrix M represent subjects (users, accounts, programs) and the columns represent objects (relations, records, columns, views, operations). Each position M (i, j) in the matrix represents the types of privileges (read, write, update) that subject i holds on object j. To control the granting and revoking of relation privileges, each relation R in a database is assigned an owner account, which is typically the account that was used when the relation was created in the first place. The owner of a relation is given all privileges on that relation. In SQL, the DBA can assign an owner to a whole schema by creating the schema and associating the appropriate authorization identifier with that schema, using the CREATE SCHEMA command. The owner account holder can pass privileges on any of

the owned relations to other users by granting privileges to their accounts. In SQL the following types of privileges can be granted on each individual relation R: SELECT (retrieval or read) privilege on R: Gives the account retrieval privilege. In SQL this gives the account the privilege to use the SELECT statement to retrieve tuples from R. MODIFY privileges on R: This gives the account the capability to modify tuples of R. In SQL this privilege is further divided into UPDATE, DELETE, and INSERT privileges to apply the corresponding SQL command to R. In addition, both the INSERT and UPDATE privileges can specify that only certain attributes of R can be updated by the account. REFERENCES privilege on R: This gives the account the capability to reference relation R when specifying integrity constraints. This privilege can also be restricted to specific attributes of R. Notice that to create a view; the account must have SELECT privilege on all relations involved in the view definition. Specifying Privileges Using Views The mechanism of views is an important discretionary authorization mechanism in its own right. For example, if the owner A of a relation R wants another account B to be able to retrieve only some fields of R, then A can create a view V of R that includes only those attributes and then grant SELECT on V to B. The same applies to limiting B to retrieving only certain tuples of R; a view V can be created by defining the view by means of a query that selects only those tuples from R that A wants to allow B to access. Revoking Privileges In some cases it is desirable to grant a privilege to a user temporarily. For example, the owner of a relation may want to grant the SELECT privilege to a user for a specific task and then revoke that privilege once the task is completed. Hence, a mechanism for revoking privileges is needed. In SQL a REVOKE command is included for the purpose of canceling privileges. Propagation of Privileges using the grant option Whenever the owner A of a relation R grants a privilege on R to another account B, the privilege can be given to B with or without the GRANT OPTION. If the GRANT OPTION is given, this means that B can also grant that privilege on R to other accounts. Suppose that B is given the GRANT OPTION by A and that B then grants the privilege on R to a third account C, also with GRANT OPTION. In this way, privileges on R can propagate to other accounts without the knowledge of the owner of R. If the owner account A now revokes the privilege granted to B, all the privileges that B propagated based on that privilege should automatically be revoked by the system.

2. List and briefly describe various threats to the Database Security.


Solution:

Threats to databases result in the loss or degradation of some or all of the following security goals: integrity, availability, and confidentiality. Loss of integrity: Database integrity refers to the requirement that information be protected from improper modification. Modification of data includes creation, insertion, modification, changing the status of data, and deletion. Integrity is lost if unauthorized changes are made to the data by either intentional or accidental acts. If the loss of system or data integrity is not corrected, continued use of the contaminated system or corrupted data could result in inaccuracy, fraud, or erroneous decisions.

Loss of availability: Database availability refers to making objects available to a human user or a program to which they have a legitimate right. Loss of confidentiality: Database confidentiality refers to the protection of data from unauthorized disclosure. The impact of unauthorized disclosure of confidential information can range from violation of the Data Privacy Act to the jeopardization of national security. Unauthorized, unanticipated, or unintentional disclosure could result in loss of public confidence, embarrassment, or legal action against the organization. To protect databases against these types of threats four kinds of countermeasures can be implemented: access control, inference control, flow control, and encryption. In a multiuser database system, the DBMS must provide techniques to enable certain users or user groups to access selected portions of a database without gaining access to the rest of the database. This is particularly important when a large integrated database is to be used by many different users within the same organization. For example, sensitive information such as employee salaries or performance reviews should be kept confidential from most of the database systems users. A DBMS typically includes a database security and authorization subsystem that is responsible for ensuring the security of portions of a database against unauthorized access. It is now customary to refer to two types of database security mechanisms: Discretionary security mechanisms: These are used to grant privileges to users, including the capability to access specific data files, records, or fields in a specified mode (such as read, insert, delete, or update). Mandatory security mechanisms: These are used to enforce multilevel security by classifying the data and users into various security classes (or levels) and then implementing the appropriate security policy of the organization. For example, a typical security policy is to permit users at a certain classification level to see only the data items classified at the users own (or lower) classification level. An extension of this is rolebased security, which enforces policies and privileges based on the concept of roles. A second security problem common to all computer systems is that of preventing unauthorized persons from accessing the system itself, either to obtain information or to make malicious changes in a portion of the database. The security mechanism of a DBMS must include provisions for restricting access to the database system as a whole. This function is called access control and is handled by creating user accounts and passwords to control the login process by the DBMS. A third security problem associated with databases is that of controlling the access to a statistical database, which is used to provide statistical information or summaries of values based on various criteria. For example, a database for population statistics may provide statistics based on age groups, income levels, size of household, education levels, and other criteria. Statistical database users such as government statisticians or market research firms are allowed to access the database to retrieve statistical information about a population but not to access the detailed confidential information on specific individuals. Security for

statistical databases must ensure that information on individuals cannot be accessed. It is sometimes possible to deduce or infer certain facts concerning individuals from queries that involve only summary statistics on groups; consequently, this must not be permitted either. This problem is called statistical database security. The corresponding countermeasures are called inference control measures. Another security issue is that of flow control, which prevents information from flowing in such a way that it, reaches unauthorized users. Channels that are pathways for information to flow implicitly in ways that violate the security policy of an organization are called covert channels. A final security issue is data encryption, which is used to protect sensitive data (such as credit card numbers) that is being transmitted via some type of communications network. Encryption can be used to provide additional protection for sensitive portions of a database as well. The data is encoded using some coding algorithm. An unauthorized user who accesses encoded data will have difficulty deciphering it, but authorized users are given decoding or decrypting algorithms (or keys) to decipher the data. Encrypting techniques that are very difficult to decode without a key have been developed for military applications. Popular techniques such as public key encryption, which is heavily used to support Web-based transactions against databases, and digital signatures, which are used in personal communications, have been developed. Database Security and the DBA The database administrator (DBA) is the central authority for managing a database system. The DBAs responsibilities include granting privileges to users who need to use the system and classifying users and data in accordance with the policy of the organization. The DBA has a DBA account in the DBMS, sometimes called a system or super user account, which provides powerful capabilities that are not made available to regular database accounts and users. DBA-privileged commands include commands for granting and revoking privileges to individual accounts, users, or user groups and for performing the following types of actions: 1. Account creation: This action creates a new account and password for a user or a group of users to enable access to the DBMS. 2. Privilege granting: This action permits the DBA to grant certain privileges to certain accounts. 3. Privilege revocation: This action permits the DBA to revoke (cancel) certain privileges that were previously given to certain accounts. 4. Security level assignment: This action consists of assigning user accounts to the appropriate security classification level.

The DBA is responsible for the overall security of the database system. Action 1 in the preceding list is used to control access to the DBMS as a whole, whereas actions 2 and 3 are used to control discretionary database authorization, and action 4 is used to control mandatory authorization.

3. Explain the concept of Data Mining? How it is related with data warehouse.
Solution:

The goal of a data warehouse is to support decision making with data. Data mining can be used in conjunction with a data warehouse to help with certain types of decisions. Data mining can be applied to operational databases with individual transactions. To make data mining more efficient, the data warehouse should have an aggregated or summarized

collection of data. Data mining helps in extracting meaningful new patterns that cannot be found necessarily by merely querying or processing data or metadata in the data warehouse. Data mining applications should therefore be strongly considered early, during the design of a data warehouse. Also, data mining tools should be designed to facilitate their use in conjunction with data warehouses. In fact, for very large databases running into terabytes of data, successful use of data mining applications will depend first on the construction of a data warehouse. Data mining as a Part of the Knowledge Discovery Process: Knowledge Discovery in Databases, frequently abbreviated as KDD, typically encompasses more than data mining. The knowledge discovery process comprises six phases: data selection, data cleansing, enrichment, data transformation or encoding, data mining, and the reporting and display of the discovered information. As an example, consider a transaction database maintained by a specialty consumer goods retailer. Suppose the client data includes a customer name, zip code, and phone number, date of purchase, item code, price, quantity, and total amount. A variety of new knowledge can be discovered by KDD processing on this client database. During data selection, data about specific items or categories of items, or from stores in a specific region or area of the country, may be selected. The data cleansing process then may correct invalid zip codes or eliminate records with incorrect phone prefixes. Enrichment typically enhances the data with additional sources of information. For example, given the client names and phone numbers, the store may purchase other data about age, income, and credit rating and append them to each record. Data transformation and encoding may be done to reduce the amount of data. For instance, item codes may be grouped in terms of product categories into audio, video, supplies, electronic gadgets, camera, accessories, and so on. Zip codes may be aggregated into geographic regions, incomes may be divided into ranges, and so on. If data mining is based on an existing warehouse for this retail store chain, we would expect that the cleaning has already been applied. It is only after such preprocessing that data mining techniques are used to mine different rules and patterns. The result of mining may be to discover the following type of new information: a. Association rules for example, whenever a customer buys video equipment, he or she also buys another electronic gadget. b. Sequential patterns for example, suppose a customer buys a camera, and within three months he or she buys photographic supplies, then within six months he is likely to buy an accessory item. This defines a sequential pattern of transactions. A customer who buys more than twice in the lean periods may be likely to buy at least once during the Christmas period. c. Classification trees for example, customers may be classified by frequency of visits, by types of financing used, by amount of purchase, or by affinity for types of items, and some revealing statistics may be generated for such classes.

We can see that many possibilities exist for discovering new knowledge about buying patterns, relating factors such as age, income group, place of residence, to what and how much the customers purchase. This information can then be utilized to plan additional store locations based on demographics, to run store promotions, to combine items in advertisements, or to plan seasonal marketing strategies. As this retail store example shows, data mining must be preceded by significant data preparation before it can yield useful information that can directly influence business decisions. The results of data mining may be reported in a variety of formats, such as listings, graphic outputs, summary tables, or visualizations.

4. What do you mean by the Outer Join? Explain the process of Outer Join. Solution:

The outer join is used when a join query is united with the rows not included in the join, and is especially useful if constant text flags are included. First, look at the query:

SELECT OWNERID, is in both Orders & Antiques FROM ORDERS, ANTIQUES WHERE OWNERID = BUYERID UNION SELECT BUYERID, is in Antiques only FROM ANTIQUES WHERE BUYERID NOT IN (SELECT OWNERID FROM ORDERS); The first query does a join to list any owners who are in both tables and putting a tag line after the ID repeating the quote. The UNION merges this list with the next list. The second list is generated by first listing those IDs not in the Orders table, thus generating a list of IDs excluded from the join query. Then, each row in the Antiques table is scanned, and if the BuyerID is not in this exclusion list, it is listed with its quoted tag. There might be an easier way to make this list, but its difficult to generate the informational quoted strings of text. This concept is useful in situations where a primary key is related to a foreign key, but the foreign key value for some primary keys is NULL. For example, in one table, the primary key is a salesperson, and in another table are customers, with their salesperson listed in the same row. However, if a salesperson has no customers, that persons name wont appear in the customer table. The outer join is used if the listing of all salespersons is to be printed, listed with their customers, whether the salesperson has a customer or notthat is, no customer is printed (a logical NULL value) if the salesperson has no customers, but is in the salespersons table. Otherwise, the salesperson will be listed with each customer.

You might also like