Running Head: Database Development 1

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 11


Database Development





1. Three tasks to be performed in the process of improving the quality of datasets by means

of the Software Development Life Cycle methodology with the description of the activity at

each stage.

It is cheaper to correct database related issues when discovered in an earlier stage than

when they erupt at the final phase of development. Therefore, there is a need to perform data

quality checks in each phase of SDLC for the sake of coming up with an error free product as

deliberated below;

Development Phase

At this stage is the totally coding and engineering of the system with the effort to meet

the set system requirements. Bassil (2012) explained that for the sake of quality, there should be

an iterative process in data assessment so that the end product is perfect. Software and hardware

specifications are reviewed together with the system architecture. Data that is being created as

well as existing data should be monitored well. However, there should be set processes of error

detection with special tools like CA Veracode Greenlight and CA Service Virtualization.

Testing Phase

This is where the code produced during the development phase is tested. For the purpose

of refining quality datasets, there is a need to consider process control together with

improvement. There should be both dynamic, static, and manual analysis carried out in this

phase. A comprehensive array of functional, integration, performance, and even unit testing is

done inconsideration of the language and system.

Maintenance Phase

Here, systematic, application and administrative changes are witnessed in the system of

the application being developed. There has to be an appropriate continuous monitoring metrics

for the purpose of checking the quality of data hence providing the means for taking speedy

action when need be. In this way, error correction is easily achieved hence quality data is


2. Actions to be performed with the aim of optimizing the selection of records and entire

database performance improvement basing on quantitative data quality assessment.

Automated controls can be well applied in the design stage of SDLC whereby controls

including processing, input and output are employed for the purpose of security, reliability, and

integrity of the system and also datasets (Chikkerur et al., 2012). For instance, duplicate

information and blank fields are avoided with the help of input controls like duplication checks

and completeness checks. Automating process controls, on the other hand, monitors the

correctness of the system in processing and also in information recording. Error detection,

process design, and process control are some of the quality management techniques that can be

used to improve quality assessment.

3. Three maintenance plans together with three activities to be performed with the aim of

improving data quality.

Three maintenance plans. Corrective plan, preventive plan, and maintenance plan are

vital for the purpose of improving data quality. The corrective plan is done after a defect has

been witnessed, unlike a preventive plan which is a precaution put in place to avoid errors that

may emerge. The maintenance plan, on the other hand, involved daily serving of the entire


Three activities to be performed to improve data quality.

Error Detection and Correction. Here, activities that can be performed while

improving data quality. Missing values are checked, the available data is compared to the correct

baseline, and also the time stamp that is associated with the current data is examined. The

complexity of data like the processing stages, outputs, and inputs is considered while

implementing the policies that are correctional in nature.

Process control and improvement. The quality requirements of data are defined by the

Total Data Quality Management (TDQM) which is a methodology that results in analyzed and

improved data. The methodologies that support TDQM are quality dimension visualization,

systematic demonstration of data and quality improvement optimization.

Process design. Here the data processes are built as new and the existing ones are

redesigned for the purpose of either eliminating or reducing data errors. Therefore, the quality of

data is improved where the very causes of defects are eliminated.

4. (i)The most efficient method for planning proactive concurrency control together with

lock granularities and how it minimizes security risks on a database in a multiuser


As a result of multiple systems experiencing tasks simultaneously, conflicts erupt and

results in inconsistencies. Rows, pages, cells, and even tables are locked by means of granular

locking schemes. High and low granularity approach are two ways or rather methods that’s that

serves databases that are distributed in nature with consistency. Therefore, maximum

concurrency is attained with high granularity despite that it needs additional overhead unlike low

granularity that reduces concurrency and at the same time requires minimal overhead. However,

proactive concurrency control is attained within the system by means of providing extra

overhead by means of locking granularity at diverse stages of object oriented hierarchy levels

(Cowling and Liskov, 2012).

(ii) How to avoid record-level locking of the database that is in use due to its current

transactions while employing the verify method in planning out of a system in a more

effective manner.

Consistency and concurrency have to be controlled well in a multiuser database that

experiences simultaneous execution of transactions so that consistent results can be obtained.

Serializability model, which is a transaction isolation model is used to make it look like all the

transactions always happen at one time. Multiple users are provided with a separate view of

real-time data hence avoiding record-level locking interfering with the database with the help of

a multi-version consistent model.

Discussion 1

Centralized Verses Decentralized Database Management System

Challenges that come with big data. Big data generally implies to a massive amount of

data that may be structured or unstructured to an extent that it is so large that the means of

processing it with traditional software techniques and databases is difficult. The processing

capacity that is currently available finds it difficult to manage its capacity and also its speed.

There are challenges that come with big data which have not been an issue for the traditionally

designed databases like the relational ones (Özsu and Valduriez, 2011). The first one is that big

data is made up of cluster servers where each one has a slice of data that is stored in then. There

are multiple uses of nodes among applications when communicating in this clusters. This makes

it hard to protect big data since it needs one to secure the whole data center and not a single


The other challenge with big data is the fact that it lacks a standard cluster. Tuple stores,

wide columnar stores, and graph data are just but a few to mention among the more than one

hundred and fifty data variants available in bid data and each of them with a unique

specialization. Components only can be swapped between many of these variations but things

like resource manager, data model, data access layer and orchestration tools among others are

interchangeable. While building these platforms, security was not considered but only

performance and scalability was what was the building blocks. This leads to limited capabilities

compared to those distributions that are commercial.

When you talk of compatibility between big data and the existing traditional tools, a

number of traditional tools do not fit and work in a good way with the technologies that are seen

in big data. The capabilities that the traditional products have is outpaced with the velocity of

data, multi-node design, sheer scale and variety that comes with big data. There are also

challenges in terms of scaling on some forms of security like masking, encryption that is row-

level and even analyzing packets. However, some of the forms of security like query monitoring

and content filtering generally do not work.

How NoSQL addresses these challenges. The term NoSQL is used to give the

difference between the relational database and these platforms simple to carry the meaning of

“Not Only SQL”. The most known way that NoSQL approaches big data security issues is by

means of a model known as “walled garden security model”. In this approach, the entire

structure is placed on a separate network allowing it to control its logical access through access

controls and firewalls. This is to mean that within the NoSQL, there is no security but only on

the outer protective shell of applications and network around the database (Hecht and Jablonski,

2011). It is a cost-effective and simple approach but only for organizations that are not so much

worried about security.

The other way that NoSQL uses in approaching big data challenges is by means of third

party products or leverages security tools that are made in the NoSQL cluster. Some of these

tools include Kerberos which serves the function of node authentication, SSL or TSL which

assists in securing communication, the transparent encryption which offers data-at-rest security

among others. The only setbacks are that they do not control rogue admins despite being most

comprehensive and effective as much as NoSQL is concerned.

Data-centric security is another NoSQL security model that is known for protecting data

even before the very data moves to a data repository that is bigger. This is done with the help of

basic tools like masking, tokenization, and also the data element encryption. In an event where

the system that is tasked with processing data cannot be in one way or the other trusted, data-

centric security model is employed. This, therefore, is to mean big data clustered are not trusted

in information keeping by many enterprises. The controls are defined on data before any effort of

moving that very data can be made.

NoSQL data models. Denormalization model is one of the NoSQL data models that

entails copying of similar data into multiple tables or rather documents with the aim of

simplifying the process of querying or so that a user’s records can fit into a certain data model.

This model is advantaged whereby data needed for a query to be processed is grouped in one

place hence resulting to simplicity in query processing. Unlike traditional databases where

modeling-time normalization and what can be termed as query-time that adds more complexity

on the side of the query processor, denormalization provides for storage of data in structures that

are query friendly hence simple query processing.

Benefits of NoSQL. The main importance of denormalization is to tune a particular

database to fit a certain application. Online Analytical Processing applications (OLAP) like

financial reporting, business reporting and sales, and budgeting are the most beneficiaries of

denormalization. This is due to their behavior of extracting data that has been kept for a longer

period. Here, denormalization helps by avoiding joins in the databases, reduced tables, reduced

foreign keys and allowing a star alteration method.

Discussion 2

Business Intelligence Tools

These are application software that tasked with retrieving, transforming, reporting, and

even analyzing data form systems that are internal or even external (Turban et al., 2010). There

are several tools that are designed which can as well be used to report business performance like

the once discussed below;

Actuate business intelligence and reporting tools (BIRT). It comes with the advantage

of being open source which is purely Java coded with the capability of publishing reports across

multiple data sources like XML, business relational databases, to even Java objects that are in-

memory. It also has the character of being composed with a component of Java that is runtime. It

has features like the single view of all data, user friendly, analytical techniques that are best

practices, enterprise reporting, a performance indicator, faster in performance among others.

System application products in data processing (SAP) business intelligence.

Popularly known to be an application that ranks at an enterprise level and usually for server

systems and open clients too. It is currently ranked as the best among organizations due to its

portability and quality services that it provides. It has features like simple warehouse architecture

in terms of its data, it is flexible, its applications are compatible with any system, it can be easily

utilized due to its’ modular concept, it has support in terms of cloud deployment and On-

premise, and the best of all is that it can be easily integrated with SAP and other applications that

are not SAP in nature. It has special add-ins that play a vital role as far as business performance

reporting is concerned like excel add-ins and other BI platforms like arcplan, Cognos, QlikView

among others.

Cost estimation. There are important considerations that either an individual or a

company has to consider before purchasing any business intelligence product. For instance, when

a company desires to venture into the business of buying the above discussed business

intelligence tools, aspects like functionality, integration capabilities, and even the benefits that

the product will bring to the company have to be considered. Looking at BIRT, for instance, it is

estimated to be one of the most expensive business intelligence tools currently. It is estimated to

cost around 20 000 dollars a year for a company to be able to get full services that it comes with.

On the other hand, SAP is much cheaper than BIRT since it is estimated to cost 3213 dollars a

year for a professional license.

The functionality part of BIRT considering the price that it comes with, it is more

complex whereby the cost of training the users is also incorporated in the pricing of the software.

There have to be certain configurations done to fit your business before it is released to the

buyer. The integration part of BIRT also is complex meaning it has to run on Java platform only

making it expensive for the buyer. SAP on the other has it is one of the tools that can be easily

integrated to multiple environments and the good thing about it is that it works on any browser.

This feature makes its cost to be lower. The functionality character of SAP is recommendable

due to its ease to use and portability. This is the reason as to why the vendors did not include the

cost of training and integration to the product. Therefore, most enterprises nowadays are going

for SAP due to the benefits it comes with compared to BIRT. An organization does not require

special servers and technicians to maintain and integrate SAP compared to the requirements that

come with BIRT.



Bassil, Y. (2012). A simulation model for the waterfall software development life cycle. arXiv

preprint arXiv:1205.6904.

Chikkerur, S., Sundaram, V., Reisslein, M., & Karam, L. J. (2011). Objective video quality

assessment methods: A classification, review, and performance comparison. IEEE

transactions on broadcasting, 57(2), 165.

Cowling, J. A., & Liskov, B. (2012, June). Granola: Low-Overhead Distributed Transaction

Coordination. In USENIX Annual Technical Conference (Vol. 12).

Hecht, R., & Jablonski, S. (2011, December). NoSQL evaluation: A use case oriented survey.

In Cloud and Service Computing (CSC), 2011 International Conference on (pp. 336-

341). IEEE.

Özsu, M. T., & Valduriez, P. (2011). Principles of distributed database systems. Springer

Science & Business Media.

Turban, E., Sharda, R., & Delen, D. (2010). Decision Support and Business Intelligence Systems

(required). Google Scholar.

You might also like