Professional Documents
Culture Documents
IM Part A & B
IM Part A & B
MELVISHARAM VELLORE
Superimpose a logical structure upon the data on the basis of these relationships.
3. Define ER diagram.
An ER (Entity Relationship) diagram is a diagram that helps to design databases in an efficient way.
4. What are the various attributes in ER diagram?
Attributes are the properties of entities. Attributes are represented by means of ellipses. Every ellipse
represents one attribute and is directly connected to its entity (rectangle).
5. What is meant by normalization?
Normalization is a systematic way of ensuring that a database structure is suitable for generalpurpose querying and free of certain undesirable characteristicsinsertion, update, and deletion that
could lead to loss of data integrity
6. What are the various forms present in normalization?
Normalization consists of normal forms that are 1NF, 2NF, 3NF, BOYCE-CODD NF 4NF and 5NF.
7. Define data modeling.
Data modeling in software engineering is the process of creating a data model for a system by
applying formal data modeling techniques.
8. List out the business rules.
Explicit expression
Coherent representation
Evolutionary extension
Declarative nature
9. What is meant by JDBC?
Java Database Connectivity (JDBC) is an application programming interface (API) for the
programming language Java, which defines how a client may access a database. It is part of the Java
Standard Edition platform, from Oracle Corporation.
10. What is Flume?
Flume is a distributed, reliable, available service for efficiently moving large amounts of data as it is
produced.
11. Define big data.
Big Data is data whose scale, diversity, and complexity require new architecture, techniques,
algorithms, and analytics to manage it and extract value and hidden knowledge from it
12. What are the advantages of big data?
Scale (Volume):
Data Volume
New way
d. a data and analytics platform
e. does all the data processing and analytics in one layer, without moving data back and forth
f. scalable (scale out) commodity hardware
17. Define Hbase.
HBase is an open-source, distributed, column-oriented database built on top of HDFS based on Big
Table. A distributed data store that can scale horizontally to 1,000s of commodity servers and
petabytes of indexed storage.
18. Give the purpose of Hbase.
Hbase is designed to operate on top of the Hadoop distributed file system (HDFS) for scalability,
fault tolerance, and high availability.
19. Define thrift client.
The Hive Thrift Client makes it easy to run Hive commands from a wide range of programming
languages. Thrift bindings for Hive are available for C++, Java, PHP, and Python.
20. What is MapReduce?
MapReduce is a processing technique and a program model for distributed computing based on java.
The MapReduce algorithm contains two important tasks, namely Map and Reduce.
Map takes a set of data and converts it into another set of data, where individual elements are broken
down into tuples (key/value pairs).
Reduce takes the output from a map as an input and combines those data tuples into a smaller set of
tuples.
Part B
1.
2.
3.
4.
5.
6.
7.
Unit II
DATA SECURITY AND PRIVACY
1. Define data security.
Data security means protecting data, such as a database, from destructive forces and from the
unwanted actions of unauthorized users..
Data security is the practice of keeping data protected from corruption and unauthorized
access. The focus behind data security is to ensure privacy while protecting personal or
corporate data.
2.
3. What is flaw?
A flaw is a problem with a program. A security flaw is a problem that affects security in some way
Confidentiality, integrity, availability. Flaws come in two types: faults and failures
4. What is meant by fault?
When a human makes a mistake, called an error, in performing some software activity, the error may
lead to a fault, or an incorrect step, command, process, or data definition in a computer program.
A fault is a mistake behind the scenes
o An error in the code, data, specification, process, etc.
o A fault is a potential problem
5. What is meant by failure?
A failure is when something actually goes wrong, means deviation from desired behaviour, (not
necessarily from specified behaviour).
6. Mention the types of flaws in program security.
Domain error
application security threat that cannot be efficiently controlled by conventional antivirus software
alone.
8. Define virus.
A virus is a program or programming code that replicates by being copied or initiating its
copying to another program, computer boot sector or document.
A virus is a piece of code that inserts itself into a host [program], including operating
systems, to propagate. It cannot run independently. It requires that its host program be run to
activate it.
9. What is worm?
A computer worm is a standalone malware computer program that replicates itself in order to
spread to other computers. Often, it uses a computer network to spread itself, relying on
security failures on the target computer to access it. Unlike a computer virus, it does not need
to attach itself to an existing program.
A worm is a program that can run independently, will consume the resources of its host
[machine] from within in order to maintain itself and can propagate a complete working
version of itself on to other machines.
10. Describe the term firewall.
A firewall is a network security system that monitors and controls the incoming and outgoing
network traffic based on predetermined security rules.
11. Who is intruder?
An Intruder is a person who attempts to gain unauthorized access to a system, to damage that
system, or to disturb data on that system. This person attempts to violate Security by interfering with
system Availability, data Integrity or data Confidentiality.
12. What is intrusion detection system?
An intrusion detection system (IDS) is a device or software application that monitors network
or system activities for malicious activities or policy violations and produces electronic reports to a
management station.
13. Give some data privacy principles.
Personal data shall be obtained only for one or more specified and lawful purposes, and shall
not be further processed in any manner incompatible with that purpose or those purposes.
Personal data shall be adequate, relevant and not excessive in relation to the purpose or
purposes for which they are processed.
14. List out few data privacy security laws.
Electronic Communications Privacy Act (ECPA);
Fair Credit Reporting Act (FCRA);
Fair and Accurate Credit Transaction Act (FACTA);
Children Online Privacy Protection Act (COPPA);
15. Give the limitations of IDS
Noise can severely limit an intrusion detection system's effectiveness. Bad packets generated from
software bugs, corrupt DNS data, and local packets that escaped can create a significantly high falsealarm rate.
16. What are the functions of IDS?
The functions preferred by IDS are listed below
Monitors users and system activities.
Scrutinizes the system configuration assessing its mis-configurations and vulnerabilities.
Checks the integrity of the critical system and data files.
Discovers attack patterns in a system activity.
Corrects system configuration errors.
17. What are the various ways of virus attachments?
The various ways of virus attachment are as follows.
Appended virus
Virus that surrounds a program.
Integrated virus.
18. What is compliance?
Compliance is merely a snapshot of how your security program meets a specification set of security
requirements at a given moment in time.
19. List out various types of firewalls.
Packet filtering gateway
Stateful inspection firewall
Application proxies
Guards
20. What is a non-malicious program error?
Non-malicious program errors are mostly due to human mistakes that go unnoticed while coding and
do not cause severe damage to the system.
A few types of non-malicious program errors are,
Buffer overflows
Incomplete mediation
Time-of-check to time-of-use errors
Part B
1.
2.
3.
4.
5.
6.
Unit III
INFORMATION GOVERNANCE
1. What is Master Data Management (MDM)?
Master data management (MDM) is a comprehensive method of enabling an enterprise to link all of
its critical data to one file, called a master file that provides a common point of reference. When
properly done, MDM streamlines data sharing among personnel and departments.
2. Define data consolidation.
Data consolidation is the process of capturing master data from multiple sources and integrating into
a single hub (operational data store) for replication to other destination systems.
3. What is data propagation?
Data propagation is the process of copying master data from one system to another, typically
through point-to-point interfaces in legacy systems.
4. Why MDM is needed?
Regulatory compliance
Data governance (DG) refers to the overall management of the availability, usability, integrity, and
security of the data employed in an enterprise. A sound data governance program includes a
governing body or council, a defined set of procedures, and a plan to execute those procedures.
7. Differentiate location privacy and database privacy.
Database privacy
Location privacy
The goal is to keep the privacy of the stored data. Eg The goal is to keep the privacy of data that is not
medical data
stored. Eg received location data
Queries are explicit. Eg SQL queries for patient
Queries need to be private. Eg location based queries
record
Application for the current snapshot of data
Should tolerate the high frequency of location
updates.
Privacy requirements are set for the whole set of
Privacy requirements are personalized
data
8. List out the few goals of data governance.
Increasing consistency and confidence in decision making
Decreasing the risk of regulatory fines
Improving data security, also defining and verifying the requirements for data distribution
policies
Maximizing the income generation potential of data
Designating accountability for information quality
Enable better planning by supervisory staff
Minimizing or eliminating re-work
Optimize staff effectiveness
Establish process performance baselines to enable improvement efforts
Acknowledge and hold all gain
9. Explain in short different data quality management tools.
The different data quality management tools are as follows.
Data cleansing tool
Data parsing tools
Data profiling tools
Data matching tools
Data standardization
Data extract, transform and load( ETL) tools
10. Give the different stages of MDM implementation.
Identify sources of master data
Identify the procedures and consumers of the master data
Collect and analyze metadata about your master data
Appoint data stewards
Implement a data governance program and data governance council.
Develop the master data model.
Unit IV
INFORMATION ARCHITECTURE
1. What are the various principles of information architect?
The individual who organizes the patterns inherent in data, making the complex clear.
A person who creates the structure or map of information which allows others to find their
personal paths to knowledge.
The emerging 21st century professional occupation addressing the needs of the age focused
upon clarity, human understanding and the science of the organization of information.
2.
Organizations are completely altruistic, they usually want to know the return on their investment for
information architecture design.. Buying information architecture services is not like investing in a
mutual fund. You can't calculate hard and fast numbers to show the exact benefit of your investment
over time.
6. Define heterogeneity.
Heterogeneity refers to an object or collection of objects composed of unrelated or unlike parts.
7. Explain the dimensions of information ecology.
Information ecology has three dimensions, namely content, context and users to address to complex
dependencies that exist in the system.
The labeling system is used for representing thoughts and concepts on a website. It is associated with
chunks of information linked to a label on the web site.
14. What a navigation system does?
The navigation system involves components or group of components on a website that enable access
to web pages within a site. The navigation on a website allows users to migrate from one page to
another.
15. What does the navigation system tool will do?
The navigation system tools provide context and flexibility to the users that help them to understand
where they are and where they can go. The navigation system can be designed to support
associatively by providing resources related to context that are currently being displayed.
16. Draw the components of information architecture.
Labeling system
Searching system
Part B
1. Define information architecture and explain its different components in detail.
2. What is the significance of organization and navigation systems in information architecture?
3. Explain the responsibilities of an information architect, graphics designers, web designer, and
programmer with respect to information architecture.
4. Explain in detail the classification of an organization system.
5. Explain the different types of labels.
6. Explain navigation systems briefly.
7. Explain the different organization structures.
8. Explain different organization schemes used in an organization system.
9. Explain the different phases of information architecture development.
10. Explain the dimensions of information ecology.
Unit V
Personal information.
Business information.
Classified information
Public
Internal
Sensitive