Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

DHANALAKSHMI COLLEGE OF ENGINEERING,

CHENNAI

Department of Information Technology

IT6701 - INFORMATION MANAGEMENT


Anna University 2 & 16 Mark Questions & Answers

Year / Semester: IV / VII


Regulation: 2013
Academic year: 2017 - 2018
UNIT I
PART-A
1. What is a data model? List the types of data model used.

A database model is the theoretical foundation of a database and fundamentally determines in


whichmanner data can be stored, organized, and manipulated in a database system.

It thereby defines the infrastructure offered by a particular database system. The most
popularexample of a database model is the relational model.
Types of data model used

 Hierarchical model
 Network model
 Relational model
 Entity-relationship
 Object-relational model
 Object model
2. Define database management system? List some applications of DBMS.

Database Management System (DBMS) is a collection of interrelated data and a set of


programs to accessthose data.

 Banking
 Airlines
 Universities
 Credit card transactions
 Tele communication
 Finance
 Sales
 Manufacturing
 Human resources
3. Give the levels of data abstraction?

 Physical level
 Logical level
 View level
4. Define data model?
A data model is a collection of conceptual tools for describing data, data relationships, data
semantics andconsistency constraints.
5. What is an entity relationship model?

The entity relationship model is a collection of basic objects called entities and relationship
among thoseobjects. An entity is a thing or object in the real world that is distinguishable from
other objects.
6. What are attributes and relationship? Give examples.

 An entity is represented by a set of attributes.


 Attributes are descriptive properties possessed by each member of an entity set.
 Example: possible attributes of customer entity are customer name, customer id,
Customer Street,customer city.
 A relationship is an association among several entities.
 Example: A depositor relationship associates a customer with each account that he/she
has.
7. Define single valued and multivalued attributes.

 Single valued Attributes: attributes with a single value for a particular entity are called
singlevalued attributes.
 Multivalued Attributes: Attributes with a set of value for a particular entity are called
multivaluedattributes.

8. What is meant by normalization of data?

It is a process of analyzing the given relation schemas based on their Functional Dependencies
(FDs) andprimary key to achieve the properties

 Minimizing redundancy
 Minimizing insertion
 Deletion and updating anomalies
9. Define - Entity set and Relationship set.

 Entity set: The set of all entities of the same type is termed as an entity set.
 Relationship set: The set of all relationships of the same type is termed as a relationship
set.
10. What are stored, derived, composite attributes?

 Stored attributes: The attributes stored in a data base are called stored attributes.
 Derived attributes: The attributes that are derived from the stored attributes are called
derivedattributes.
 For example: The Age attribute derived from DOB attribute.
11. Define - null values.

In some cases a particular entity may not have an applicable value for an attribute or if we
do not know thevalue of an attribute for a particular entity. In these cases null value is used.
12. What is meant by the degree of relationship set?
The degree of relationship type is the number of participating entity types.

13. Define - Weak and Strong Entity Sets

 Weak entity set: entity set that do not have key attribute of their own are called weak
entity sets.
 Strong entity set: Entity set that has a primary key is termed a strong entity set.
14. What does the cardinality ratio specify?

 Mapping cardinalities or cardinality ratios express the number of entities to which


another entity canbe associated.
 Mapping cardinalities must be one of the following:
• One to one
• One to many
• Many to one
• Many to many
15. What are the two types of participation constraint?

 Total: The participation of an entity set E in a relationship set R is said to be total if every
entity in Eparticipates in at least one relationship in R.
 Partial: if only some entities in E participate in relationships in R, the participation of
entity set E inrelationship R is said to be partial.
16. What is a candidate key and primary key?

 Minimal super keys are called candidate keys.


 Primary key is chosen by the database designer as the principal means of identifying an
entity in theentity set.
17. Define -Business Rules.

 Business rules are an excellent tools to document the various aspects of business domain.
 For example: A student is evaluated for a course through combination of theory and
practicalexaminations.
18. What is JDBC? List of JDBC drivers.

Java Database Connectivity (JDBC) is an application programming interface (API) for the
programminglanguage Java, which defines how a client may access a database. It is part of
the Java Standard Editionplatform, from Oracle Corporation.

 Type 1 - JDBC-ODBC Bridge Driver.


 Type 2 - Java Native Driver.
 Type 3 - Java Network Protocol Driver.
 Type 4 - Pure Java Driver.
19. What are the steps involved to access the database using JDBC?

 Register the JDBC Driver


 Creating database connection
 Executing queries
 Processing the results
 Closing the database connection.
20. What are three classes of statements using to execute queries in java?

 Statement
 Prepared Statement
 Callable Statement
21. What is stored procedure?

 In a database management system (DBMS), a stored procedure is a set of Structured


Query Language

(SQL) statements with an assigned name that's stored in the database in compiled form so
that it can beshared by a number of programs.
 The use of stored procedures can be helpful in controlling access to data, preserving data
integrity and improving productivity.

22. What do the four V’s of Big Data denote?


IBM has a nice, simple explanation for the four critical features of big data:

 Volume – Scale of data


 Velocity – Analysis of streaming data
 Variety – Different forms of data
23. List out the companies that use Hadoop.

 Yahoo (One of the biggest user & more than 80% code contributor to Hadoop)\
 Facebook
 Netflix
 Amazon
 Adobe
 eBay
 Twitter
24. Distinguish between Structured and Unstructured data.

 Data which can be stored in traditional database systems in the form of rows and
columns, forexample the online purchase transactions can be referred to as Structured
Data.
 Data which can be stored only partially in traditional database systems, for example, data
in XMLrecords can be referred to as Semi Structured Data.
 Unorganized and raw data that cannot be categorized as semi structured or structured data
isreferred to as unstructured data.
 Facebook updates, Tweets on Twitter, Reviews, web logs, etc. are all examples of
unstructured data.
25. What concept the Hadoop framework works?
Hadoop Framework works on the following two core components-

 HDFS – Hadoop Distributed File System: It is the java based file system for scalable and
reliablestorage of large datasets. Data in HDFS is stored in the form of blocks and it
operates on the MasterSlave Architecture.
 HadoopMapReduce: This is a java based programming paradigm of Hadoop framework
thatprovides scalability across various Hadoop clusters.
26. What are the main components of a Hadoop Application?

Hadoop applications have wide range of technologies that provide great advantage in solving
complexbusiness problems.
Core components of a Hadoop application are-

 Hadoop Common
 HDFS
 HadoopMapReduce
 YARN
 Data Access Components are - Pig and Hive
 Data Storage Component is – Hbase
 Data Integration Components are - Apache Flume, Sqoop.
 Data Management and Monitoring Components are - Ambari, Oozie and Zookeeper.
 Data Serialization Components are - Thrift and Avr
 Data Intelligence Components are - Apache Mahout and Drill.
27. Whenever a client submits a hadoop job, who receives it?

 NameNode receives the Hadoop job which then looks for the data requested by the client
andprovides the block information.
 JobTracker takes care of resource allocation of the hadoop job to ensure timely
completion.
28. What is partitioning, shuffle and sort phase.

Shuffle Phase: Once the first map tasks are completed, the nodes continue to perform
several other map tasksand also exchange the intermediate outputs with the reducers as
required. This process of moving theintermediate outputs of map tasks to the reducer is
referred to as Shuffling.

Sort Phase: HadoopMapReduce automatically sorts the set of intermediate keys on a single
node before theyare given as input to the reducer.

Partitioning Phase: The process that determines which intermediate keys and value will be
received by eachreducer instance is referred to as partitioning. The destination partition is
same for any key irrespective of themapper instance that generated it.
29. Distinguish between HBase and Hive.

 HBase and Hive both are completely different hadoop based technologies-
 Hive is a data warehouse infrastructure on top of Hadoop, whereas HBase is a NoSQL
key value storethat runs on top of Hadoop.
 Hive helps SQL savvy people to run MapReduce jobs whereas HBase supports 4 primary
operationsput,get, scan and delete.
 HBase is ideal for real time querying of big data where Hive is an ideal choice for
analytical querying ofdata collected over period of time.
30. Distinguish between Hadoop 1.x and Hadoop 2.x

 In Hadoop 1.x, MapReduce is responsible for both processing and cluster management
whereas inHadoop 2.x processing is taken care of by other processing models and YARN
is responsible for clustermanagement.
 Hadoop 2.x scales better when compared to Hadoop 1.x with close to 10000 nodes per
cluster.
 Hadoop 1.x has single point of failure problem and whenever the NameNode fails it has
to be recoveredmanually. However, in case of Hadoop 2.x StandByNameNode
overcomes the problem and wheneverthe NameNode fails it is configured for automatic
recovery.

PART B
1. Describe about Database Design and Database Modelling.
2. Explain detail about normalization with suitable examples.
3. Explain about JDBC Drivers, and how to access their database?
4. Explain Hadoop Eco systems.
5. Write short notes of following

 YARN
 NoSQL
 Hive

You might also like