Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

IT1715

Data Organization

What is Data?
Data is distinct pieces of information, usually formatted in a special way. All software is divided into two general
categories: data and programs. Programs are collections of instructions for manipulating data.
Data can exist in a variety of forms — as numbers or text on pieces of paper, as bits and bytes stored in electronic
memory, or as facts stored in a person's mind. Since the mid-1900s, people have used the word data to mean
computer information that is transmitted or stored. Data is the plural of datum, a single piece of information

Machine Readable Information


Computer data is information processed or stored by a computer. This information may be in the form of text
documents, images, audio clips, software programs, or other types of data. Computer data may be processed by
the computer's CPU and is stored in files and folders on the computer's hard disk.
At its most rudimentary level, computer data is a bunch of ones and zeros, known as binary data. Because all
computer data is in binary format, it can be created, processed, saved, and stored digitally. This allows data to be
transferred from one computer to another using a network connection or various media devices. It also does not
deteriorate over time or lose quality after being used multiple times.

Data Phrases in Technology


As technology advances and changes, numerous phrases have been used over the years to describe data and how
we use and analyze it, including structured and unstructured data, massive volumes of data are now called Big
Data, while older phrases, like data integrity or data mining, are still widely used today.

The following 10 data-related definitions will help you to better understand the data and its role in information
technology.

Big Data: A massive volume of both structured and unstructured data that is so large it is difficult to process using
traditional database and software techniques.
Big Data Analytics: The process of collecting, organizing and analyzing large sets of data to discover patterns and
other useful information.
Data Center: Physical or virtual infrastructure used by enterprises to house computer, server and networking
systems and components for the company's information technology (IT) needs.
Data Integrity: Refers to the validity of data. Data integrity can be compromised in a number of ways, such as
human data entry errors or errors that occur during data transmission.
Data Miner: A software application that monitors and/or analyzes the activities of a computer, and subsequently
its user, for the purpose of collecting information.
Data Mining: A class of database applications that look for hidden patterns in a group of data that can be used to
predict future behavior.
Database: A database is basically a collection of information organized in such a way that a computer program can
quickly select desired pieces of data.
Raw Data: Information that has been collected but not formatted or analyzed.
Structured Data: Structured data refers to any data that resides in a fixed field within a record or file. This includes
data contained in relational databases and spreadsheets.
Unstructured Data: Information that doesn't reside in a traditional row-column database. As you might expect, it's
the opposite of structured data.

What is Data Organization?


Data organization, in broad terms, refers to the method of classifying and organizing data sets to make them more
useful. Some IT experts apply this primarily to physical records, although some types of data organization can also
be applied to digital records.

07 Handouts 1 *Property of STI


Page 1 of 8
IT1715

There are many ways that IT professionals work on the principle of data organization. Many of these are classified
under the more general heading of "data management." For example, re-ordering or analyzing the arrangement of
data items in a physical record is part of data organization.

One other main component of enterprise data organization is the analysis of relatively structured and unstructured
data. Structured data is comprised of data in tables that can be easily integrated into a database and, from there,
fed into analytics software or other particular applications. Unstructured data is data that is raw and unformatted,
the kind of data that you find in a simple text document, where names, dates and other pieces of information are
scattered throughout random paragraphs. Experts have developed tech tools and resources to handle relatively
unstructured data and integrate it into a holistic data environment.

Businesses adopt data organization strategies in order to make better use of the data assets that they have in a
world where data sets represent some of the most valuable assets held by enterprises across many different
industries. Executives and other professionals may focus on data organization as a component of a comprehensive
strategy to streamline business processes, get better business intelligence and generally improve a business model.

How should I organize my files?


Whether you are working on a stand-alone computer, or on a networked drive, the need to establish a system that
allows you to access your files, avoid duplication, and ensure that your data can be backed up, takes a little
planning. A good place to start is to develop a logical folder structure. The following tips should help you develop
such a system:

• Use folders - group files within folders so information on a particular topic is located in one place
• Adhere to existing procedures - check for established approaches in your team or department which you
can adopt
• Name folders appropriately - name folders after the areas of work to which they relate and not after
individual researchers or students. This avoids confusion in shared workspaces if a member of staff leaves,
and makes the file system easier to navigate for new people joining the workspace
• Be consistent – when developing a naming scheme for your folders it is important that once you
have decided on a method, you stick to it. If you can, try to agree on a naming scheme from the outset of
your research project
• Structure folders hierarchically - start with a limited number of folders for the broader topics, and then
create more specific folders within these
• Separate ongoing and completed work - as you start to create lots of folders and files, it is a good idea to
start thinking about separating your older documents from those you are currently working on
• Try to keep your ‘My Documents’ folder for files you are actively working on, and every month or so, move
the files you are no longer working on to a different folder or location, such as a folder on your desktop, a
special archive folder or an external hard drive
• Backup – ensure that your files, whether they are on your local drive, or on a network drive, are backed up
• Review records - assess materials regularly or at the end of a project to ensure files are not kept needlessly.

What do I need to consider when creating a file name?


Decide on a file naming convention at the start of your project.
Useful file names are:
• consistent
• meaningful to you and your colleagues
• allow you to find the file easily

07 Handouts 1 *Property of STI


Page 2 of 8
IT1715

Database Management Structure

• DBMS (Database Management


System) acts as an interface
between the user and the
database. The user requests the
DBMS to perform various
operations such as insert, delete,
update and retrieval on the
database.
• The components of DBMS
perform these requested
operations on the database and
provide necessary data to the
users.

• Applications: It can be considered


as a user-friendly web page where the user enters the requests. Here he simply enters the details that he
needs and presses buttons to get the data.
• End User: They are the real users of the database. They can be developers, designers, administrator or the
actual users of the database.
• DDL: Data Definition Language (DDL) is a query fired to create database, schema, tables, mappings etc in the
database. These are the commands used to create the objects like tables, indexes in the database for the first
time. In other words, they create structure of the database.
• DDL Compiler: This part of database is responsible for processing the DDL commands. That means these
compilers actually breaks down the command into machine understandable codes. It is also responsible for
storing the metadata information like table name, space used by it, number of columns in it, mapping
information etc.
• DML Compiler: When the user inserts, deletes, updates or retrieves the record from the database, he will be
sending request which he understands by pressing some buttons. But for the database to work/understand
the request, it should be broken down to object code. This is done by this compiler. One can imagine this as
when a person is asked some question, how this is broken down into waves to reach the brain!
• Query Optimizer: When the user fires some request, he is least bothered how it will be fired on the database.
He is not all aware of database or its way of performance. But whatever be the request, it should be efficient
enough to fetch, insert, update or delete the data from the database. The query optimizer decides the best
way to execute the user request which is received from the DML compiler. It is similar to selecting the best
nerve to carry the waves to the brain!
• Stored Data Manager: This is also known as Database Control System. It is one of the main central systems of
the database. It is responsible for various tasks
o It converts the requests received from query optimizer to machine-understandable form. It makes
actual request inside the database. It is like fetching the exact part of the brain to answer.
o It helps to maintain consistency and integrity by applying the constraints. That means it does not allow
inserting / updating / deleting any data if it has child entry. Similarly, it does not allow entering any
duplicate value into database tables.

07 Handouts 1 *Property of STI


Page 3 of 8
IT1715

o It controls concurrent access. If there are multiple users accessing the database at the same time, it
makes sure, all of them see correct data. It guarantees that there is no data loss or data mismatch
happens between the transactions of multiple users.
o It helps to back up the database and recover data whenever required. Since it is a huge database and
when there is an unexpected exploit of transaction, and reverting the changes are not easy. It maintains
the backup of all data so that it can be recovered.
• Data Files: It has the real data stored in it. It can be stored as magnetic tapes, magnetic disks or optical disks.
• Compiled DML: Some of the processed DML statements (insert, update, delete) are stored in it so that if there
are similar requests, it will be re-used.
• Data Dictionary: It contains all the information about the database. As the name suggests, it is the dictionary
of all the data items. It contains descriptions of all the tables, view, materialized views, constraints, indexes,
triggers etc.

Types of Databases
Alternatively referred to as a databank or a datastore, and sometimes abbreviated as a DB, a database is a large
quantity of indexed digital information. It can be searched, referenced, compared, changed, or otherwise
manipulated with optimal speed and minimal processing expense.

A database is built and maintained by using a database programming language. The most common database
language is SQL, but there are multiple "flavors" of SQL, depending on the type of database being used. Each flavor
of SQL has differences in the SQL syntax and are designed to be used with a specific type of database.

Database components

A database is made up of several main components:

• Schema - A database contains one or more schemas, which is basically a collection of one or more tables of
data.
• Table - Each table contains multiple columns, which are similar to columns in a spreadsheet. A table can
have as little as two columns and as many as one hundred or more columns, depending on the type of data
being stored in the table.
• Column - Each column contains one of several types of data or values, like dates, numeric or integer values,
and alphanumeric values (also known as varchar).
• Row - Data in a table is listed in rows, which are like rows of data in a
spreadsheet. Often there are hundreds or thousands of rows of data in
a table.

Different Types of Databases

Document Oriented Database – This database is free from any type of strict
schema. It does not store data in the form of data table but in the form of text
records. This type of database is suitable for storing dynamic data. It is useful
for an application which is document-based. Documents are encoded using some
standard formats.
Embedded Database – An embedded database runs within an application, and
therefore it does not run as a separate application. Unlike general purpose
databases, this database is embedded as inline code or linked library. It saves
time wasted on issues related to installations or maintenance. These types of
databases are generally found in the set-top boxes, mobile phones, etc. RDM
server and RDM Embedded are examples of these types of databases.

07 Handouts 1 *Property of STI


Page 4 of 8
IT1715

Graph Database – It is based on the relationship of resources with each other, and no
particular resource has any essential importance on the other. These types of graphs
help in storing data in a dynamic schema. In this graph database, each vertex works as a
mini index for its adjacent elements. Info grid type of graph database should be
preferred for model flexibility.

Hypertext Database – These types of databases are used for organizing a large sum of
dissimilar information. The type of information is not devised for carrying out
numerical analysis. An object is linked with any other object in a hypertext type of
database. This kind of database system was invented by Ted Nelson. They are
preferred for maintaining online encyclopedias. Unlike traditional databases, it has no
regular structure, and therefore the user can reach to the desired information through
different ways.

Operational Database – It contains data related to the operations going on in an


organization or enterprise. Some of the main information it contains regards the
information of employees, data describing transactions, etc. This type of database
is updated regularly. It works on the same approach as OLTP. The focus of this
database is to record current data. It is often differentiated with the data
warehouse.

Distributed Database - It consists of a set of databases which


are located on different computers, but all these databases
work as one database logically. Therefore, the data can be
accessed and modified simultaneously with the help of a
network. It is controlled by a local DBMS. It is important to
maintain consistency while dealing with this type of
arrangement.

Flat-File Database – These are data files in which


records hold no structured relationship. Additional
information is often required for understanding or
interpreting these files. In simple language, if we have
one table in a database, it will be referred to as a flat file
database. It is useful for storing a small number of
records. A spreadsheet application like Excel works as a
flat file database.

Databases Uses and Issues

You might not realize it, but databases are everywhere. Whether or not you know very much about them, or even
care to, their effect on our daily lives is extensive. From weather applications to the movies you watch online,
databases are responsible for many of the services we utilize daily. See examples of how databases enhance your
day-to-day life below.

07 Handouts 1 *Property of STI


Page 5 of 8
IT1715

1. Online Television Streaming


Any online streaming service, such as Hulu or Netflix, uses databases to generate a list of TV shows and
movies to watch, track an individual’s show preferences, and provide a list of recommended viewing. The
power required to analyze such an enormous amount of data is done through highly-specialized database
management technology, such as Cassandra. In fact, Hulu has recently been relying heavily on Cassandra.

2. Social Gaming

Gaming done across social networks is extremely data intensive. Gathering individual player information
from around the globe and serving it to players on demand requires a high availability database software.
One example is the popular Game of Thrones Ascent, a free role-playing game launched by Disruptor Beam
and based on the hit HBO series, Game of Thrones. Their Percona Server-based database solution helps
eliminates data bottlenecks during high-usage periods.

3. Personal Cloud Storage


If you save photos or documents to your smartphone or tablet, it’s likely your data is stored in “the cloud,”
a large, central storage environment with a small portion dedicated just to you. Syncing this data across
your devices requires powerful databases able to call up your data at a moment’s notice, wherever you are.

4. Sports
Fan participation in national sports doesn’t just utilize the power of the database, it depends upon it.
From fantasy football leagues to March Madness brackets, they all depend on huge databases full of player
statistics, game performances, injury reports, and more, all calculating the odds of a win on a weekly basis.

5. Finances
From the stock market to your local bank, databases are abundant across the financial world. Tracking the
vast amount of information behind the world’s daily transactions, not to mention the financial models that
analyze that data to predict future activity, requires extremely powerful databases.

6. Government Organizations
Government organizations around the world are constantly collecting data for research, defense, legislation,
and humanitarianism purposes, to name a few. This data is collected, stored and analyzed using powerful
and far-reaching database services.

7. Social Media
Every social media platform stores ream of user information in databases used to recommend friends,
businesses, products, and topics to the end user. This cross-referencing of data is immensely complex and
uses highly reliable and capable database software including, for example, MySQL which is used in Facebook
data centers.

8. E-commerce
Any online organization that sells its products or services uses databases to organize their products, pricing
information, user purchase history and then recommend other potential products to customers. This data
is stored in highly secure databases, protected by the standards set by PCI Compliance.

07 Handouts 1 *Property of STI


Page 6 of 8
IT1715

9. Healthcare
Doctor’s offices and healthcare organizations, among others, store extensive amounts of patient data for
easy accessibility. The databases behind this collection of information are not only large and complex but
are also secure and protected by Compliance standards.

10. Weather
Predicting the weather across the globe is incredibly complex and depends on a myriad of factors, all
gathered, stored and analyzed within databases, ready to deliver today’s weather to your local TV station
or smartphone app. The Weather Company, for example, takes in over 20 terabytes of data per day. The
company has used a number of databases to support this data, including MySQL, Microsoft SQL Server,
Cassandra, and more.
Databases are as unavoidable in daily life as they are necessary – a fact which is made clear by above list. Of
course, there are many types of software that run and support the numerous databases in our lives.

Challenges of Database Management


In the last few years, data volumes have grown and the way we use data has changed. Here are five database
management challenges companies face.

1. Growing complexity in landscape


As the database market evolves, many companies are finding it difficult to evaluate and choose a solution. There
are relational databases, columnar databases, and object-oriented databases.

2. Limits on scalability
The fact is, all software has scalability and resource usage limitations, including database servers. Forward-
thinking companies concerned about transaction processing capacity know that cataloging components, database
architecture, and even operating systems and hardware configuration all affect scalability.

3. Increasing data volumes


As the amount of data generated and collected explodes, companies are struggling to keep up. Research shows
that we’ve created more data in the past two years than in the entirety of the human race. Yet, a 10% increase in
data accessibility could generate more than 65 million additional net income.

4. Data security
Databases are the hidden workhorses of many companies’ IT systems, storing critical public and private data.
Lately, there has been an understandable and high-profile focus on data security. A data breach typically costs a
company $4 million, not to mention the loss of reputation and goodwill.

5. Decentralized data management


Although there are benefits to decentralized data management, it presents challenges as well. How will the data
be distributed? What’s the best decentralization method? What’s the proper degree of decentralization? A major
challenge in designing and managing a distributed database results from the inherent lack of centralized
knowledge of the entire database.

07 Handouts 1 *Property of STI


Page 7 of 8
IT1715

How to Choose the Right Database Management Solution for Your Business?
So, in the face of numerous challenges, how can companies select the best management solution for their
business? Here are a few recommendations.

Establish decision criteria.


The first step is to create an objective standard by which to evaluate your options. Of course, each company will
have different criteria. Some important considerations include the cost of ownership, ease of use, functionality,
ease of database administration, and scalability. Perhaps most important for businesses with long-range projects,
will the solution be around in 10 years?

Match the solution to your business goals.


Your choice of database technology should take into account your business goals. How much data are you
collecting? How fast do you collect it? How will you access and analyze it? Each business is different, thus there is
no one-size-fits-all answer here.

Does it work with your existing technology?


Of course, you want to avoid ending up with sprawling systems and disparate platforms. So an important
consideration is whether your solution will “play nice” with existing software and hardware components.

Workload on hardware resources.


Whatever DBMS you select will be judged on database performance, or how fast it supplies information to users.
It is important to remember that workload can fluctuate dramatically by the day, by an hour, or even by a minute.

Take for example, the database of a retailer during a holiday shopping event. Under those conditions, the
processing demands placed on the system may tax the hardware and software tools at the disposal of the system.
The goal should be enabling the largest possible workload to be processed without resource upgrades.

REFERENCES:

KerrI M., (2018) Ten Ways Database Run your Life Retrieved from
https://www.liquidweb.com/blog/ten-ways-databases-run-your-life/

Tutorial Cup (2018) Structure of DBMS.C Retrieved from https://www.tutorialcup.com/dbms/structure-of-


dbms.htm

Computer Hope (2017) Database. Retrieved from https://www.computerhope.com/jargon/d/database.htm

Difference Between (2018) Types of Databases. Retrieved from http://www.differencebetween.info/different-


types-of-database

MiCore Solutions (2016) Database Management Challenges. Retrieved from


http://www.micoresolutions.com/top-database-management-challenges/

07 Handouts 1 *Property of STI


Page 8 of 8

You might also like