7086CEM-Aparna V Kashyap

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 10

EEC 7086CEM

Data Management System

Part A: Entity relationship modelling

1. Create an ER diagram for the above scenario and indicate the cardinality of the
relationships and their nature (mandatory or optional). You should allocate adequate
attributes to the entities of interest, and specify the identifiers.
2. Generate, with appropriate justification, relational tables from the ER diagram, in their
schema form. Indicate clearly the names of the tables, the attributes, the primary keys and
the foreign keys. Explain briefly how the tables were generated and which rules of
transformation were used.

Employee(Emp Id,Emp Name,Address,Gender,DOB,Salary,Joining Date,Designation,Dept.


Id*,Role Id*)
Department(Dept Id,Dept Name,Location,No. of Emp)
Role Id(Role Id,Role Name,Role Description)

Part B. SQL programming

1. Use appropriate data types and write the SQL statements to create the tables defined in
the schema above.

CREATE TABLE "SYSTEM"."USER1"


( "USERID" VARCHAR2(50 BYTE) NOT NULL ENABLE,
"NAME" VARCHAR2(50 BYTE),
"EMAILADDR" VARCHAR2(50 BYTE),
CONSTRAINT "PRIM_UERID" PRIMARY KEY ("USERID")
USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
TABLESPACE "SYSTEM" ENABLE
) PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
NOCOMPRESS LOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
TABLESPACE "SYSTEM" ;

CREATE TABLE "SYSTEM"."MUSIC"


( "MUSICID" VARCHAR2(50 BYTE) NOT NULL ENABLE,
"TITLE" VARCHAR2(50 BYTE),
"CATEGORYCODE" VARCHAR2(50 BYTE),
"COSTPERDOWNLOAD" FLOAT(2),
CONSTRAINT "PRIM_MUSICID" PRIMARY KEY ("MUSICID")
USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
TABLESPACE "SYSTEM" ENABLE,
CONSTRAINT "FORE_CATEGORY" FOREIGN KEY ("CATEGORYCODE")
REFERENCES "SYSTEM"."CATEGORY" ("CATEGORCODE") ENABLE
) PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
NOCOMPRESS LOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
TABLESPACE "SYSTEM" ;

CREATE TABLE "SYSTEM"."MUSICDOWNLOAD"


( "USERID" VARCHAR2(50 BYTE) NOT NULL ENABLE,
"MUSICID" VARCHAR2(50 BYTE) NOT NULL ENABLE,
"DOWNLOADDATE" VARCHAR2(10 BYTE) NOT NULL ENABLE,
CONSTRAINT "PRIM_DOWNLOAD" PRIMARY KEY ("USERID", "MUSICID")
USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
TABLESPACE "SYSTEM" ENABLE,
CONSTRAINT "FORE_USERID" FOREIGN KEY ("USERID")
REFERENCES "SYSTEM"."USER1" ("USERID") ENABLE,
CONSTRAINT "FORE_MUSICID" FOREIGN KEY ("MUSICID")
REFERENCES "SYSTEM"."MUSIC" ("MUSICID") ENABLE
) PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
NOCOMPRESS LOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
TABLESPACE "SYSTEM" ;

CREATE TABLE "SYSTEM"."CATEGORY"


( "CATEGORCODE" VARCHAR2(20 BYTE),
"COLUMN1" VARCHAR2(50 BYTE),
CONSTRAINT "PRIM_CATEGORYID" PRIMARY KEY ("CATEGORCODE")
USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
TABLESPACE "SYSTEM" ENABLE
) PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
NOCOMPRESS LOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
TABLESPACE "SYSTEM" ;

2. Write SQL Statements to return the following data from the online music database:

a) The music id, the title and the categoryCode of all the music in the database, ordered by
title.

select musicid,title,categorycode from music order by title asc

b) The number of users who downloaded ‘Pop-Rock’ category of music.


select count(userid) as User_Count from musicdownload m1 where m1.musicid in (
select m.musicid from music m where m.categorycode in (
select c.categorycode from category c where c.title='Pop-Rock'))
c) The number of music downloads for each of the categories. The result listing should
include the titles of the categories and the number of music downloads for each
category title.

select c.title as "Title",count(m1.musicid) as "Count" from music m join musicdownload


m1 on m.musicid=m1.musicid join category c on m.categorycode=c.categorycode group
by c.title order by c.title

d) The titles of the categories for which music was downloaded more than once.

select t."Title" from (


select c.title as "Title",count(m1.musicid) as "Count" from music m join musicdownload
m1 on m.musicid=m1.musicid join category c on m.categorycode=c.categorycode
group by c.title having count(m1.musicid)>1 order by c.title)t

Part C. Sequential and distributed processing

1. Assuming that the data is stored in a relational database produce, with justification,
the SQL code to determine, for each product, the number of products which were
sold in each month of each year.

select productno,sum(quantity) as “Total” from orderdetails group by


(productno,monthid,yearid) order by productno
Explanation: The Select clause will print the ProductNo, Sum of Quality of products which
were sold in each month of each year. The Group by clause will combine the data based on
MonthId and GroupId and Order by clause will sort the data based on ProductNo in
ascending order by default.

2. Assuming that the data is too large to be processed in a centralised manner in a relational
database, and that it is stored in an ordinary file, produce a decentralised solution which
applies MapReduce to the data processing. Justify your decisions and all the steps of your
solution. Use diagrams if required.

This is an illustration of MapReduce to determine for each Product, the number of products
where sold in each year of each month.
The data will be stored n (key,value) pairs where one data will be the initial key and
remaining of the data will be value.

Example: Showing first entry in the list,


Key = Year(2003)
Value = [{OrderNo:”10107”,ProductNo: ”2”,Price: ”95.7”,Quantity: ”30”,Sales: ”2871”,
QtrId: ”1”,MonthId: ”2”},
{OrderNo: ”10107”,ProductNo: ”5”,Price: ”99.91”,Quantity: ”39”,Sales: ”389
6.49”,QtrId: ”1”,MonthId: ”2”},
{OrderNo:”10121”,ProductNo:”5”,Price:”81.35”,Quantity:”34”,Sales:” 2765.9”,QtrId:”
1”,MonthId:”2”}]

The above list is split based on the Year where value of Year is 2003.
The map function will parse each record until to find the product sold on each month of
each year. First the data is parsed based on Year, our next focus will be on Month of that
year to find the number of Products sold. The result of map function will be a new lists of
(key,value) pairs in which the new key data will be MonthId and remaining data will be
value.

Example:
Key = Month(2)
Value = OrderNo:”10107”,ProductNo:”2”,Price:”95.7”,Quantity:”30”,Sales:”2871”,
QtrId:”1”

Since we need the result of Sum of products for each Product sold on each month of year,
we will parse the data and result of map function will be new list of (key,value) pairs where
key will be ProductNo and value will be Quantity.

Example: (key,value)=(2,30)

Next the shuffle function will group together all the (key,value) pairs that have the same
key.
(key,value) = (5,39)
(key,value) =(5,34)

Finally the reduce function will perform aggregate function on lists of (key,value) pairs , in
this example the values are summed up associate to each key to produce final result.

The key 5 will be associated to 73 as sum of values 39 and 34.


The final result will be (key,value) = (5,73).

Part D. Research report

Craigslist moved from MYSQL to MongoDB is because of nonrelational data set nature. In
the event that we are creating with an enormous datasets, relational database is
increasingly slow on the grounds that we need to maintain the schema of the database. In
creating NoSQL utilizes archive structure like JSON design with records, in SQL it utilises
table structure. So assuming we need to add a field onto SQL then we need to change the
SQL table structure and it affects the schema of the database and it would result in NULL
values in some records. By using NOSQL , it will not affect the schema if extra field is added
as well no NULL values will be added. There is also a major reason why NOSQL is chosen
against SQL database. In SQL when records are inserted from excel spreadsheet that
contains large amount of data say 10,000 plus rows then we experience slowness in
execution and as more rows are added to the spreadsheet the computer may not be able to
handle and result in excel not working as expected. To perform or execute large amount of
data we need powerful computer to execute these records using SQL which affects the
scalability and cost factors. Whereas NOSQL is reliable as it uses distributed systems so that
our database can be distributed among large set of computers. SQL is vertically scalable and
NOSQL is horizontally scalable which means NOSQL is cost effective which is equivalent to
building a low cost building.

SQL database we ought to have a structure at the top of the priority before we create our
database and this in data set language is called Schema. Schema represents the table and
columns of that table. We should know how ay columns are required before creating
schema. SQL database is efficient to use when there is fixed schema. SQL database is not
flexible to change the schema after creation. But NoSQL is perfect for large set of data , it
can hold dynamic schema which means columns can be added ever after the creation. This
will not affect the existing structure or data of that table. NoSQL is more suitable if there is
frequent change in our database. In large set of database which has complex relationships ,
SQL database works slowly when relationships are not properly indexed, but in NOSQL , it
uses non-relational database which results in querying the database much faster and
efficient. As well MongoDB supports Cloud services where we can store and manage our
data in cloud. Cloud services is been utilised in numerous software companies. In
consolidation with ATLAS , the power of the cloud with a document database will place
developer in charge. Assuming we have a larger data utilising cloud is a superior decision.

NoSQL databases were made in web and distributed computing times that made it
conceivable to all the more effectively to execute a scale-out architecture. NoSQL database
gives us many advantages, consistency, accessibility and partition tolerance. It additionally
gives facilities to effortlessly store the graph data, which is not accessible with QL data sets.
NoSQL is a decent decision for those organisations encountering quick development with no
reasonable diagram definitions. NoSQL offers significantly more adoptability than a
relational database and is a strong choice for organisations who should examine huge
amounts of information or whose data structure they oversee are variable.

In conclusion, SQL databases are relational and NoSQL databases are non-relational. SQL are
table-based while NoSQL are document based. SQL databases are vertically scalable, and
NoSQL are horizontally scalable. SQL databases are better for multiple row transactions and
NOSQL are used for unstructured data like JSON, documents etc.

You might also like