Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Part -1 (Step - 1)

Details of data files given for final project work, following descriptions are based on original file without
making any curation work on the files.

1. Consumer_Complaints_FileA Data File:

Consumer_Complaints_FileA file contains consumer complaints data in the XML format and file
encoding is UTF-8. There is a parent tag <consumerComplaints> contains multiple child elements
<complaint> with required attributes “id”. Complaint element’s attribute “Id” holds unique complaint
number that separate each consumer complaints and make it accessible randomly using complaint
number. All elements are formatted and arranged in proper XML format
Complaint elements has multiple child elements with and without attribute that holds specific
information about consumer’s complaints like event received, event sent, product type, issue, company
name, company address, response received etc.

Here are details of XML file XML Element and attributes Contents
property: #
1 consumerComplaints This is parent node in file that
hold all complaints data
2 complaint Hold specific data of a
consumer complaint
3 id It’s a unique id of a complaint
4 Event This element contains the
state of complaint
5 type Attribute of Event element
denotes state of complain
6 date Attribute of Event element
contains event state’s date
7 product Information about product
8 productType Child element of Product and
hold information of product
type
9 subproduct Hold information if it’s
subproduct of any product
10 issue Details of issue
11 issueType Contains information of type
of issue to file complaint
12 consumerNarrative Holds details of consumer’s
complain
13 company Information about company
14 companyName It holds company name
15 companyState It hold state information in
which state company is
located
16 companyZip City zip code of company
17 submitted Information about complain
submission
18 via By which medium complain
submitted Referral | Web |
Phone
19 response Details of response of
complaint
20 timely It denotes if response
received timely or not Y | N
21 consumerDisputed Information whether
consumer disputed or not Y |
N
22 publicResponse Information if company made
any public announcement
23 responseType It holds information of
decision or current state of
complaint

Consumer_Complaints_FileA.xml MD5 checksum # 5ee2940da0d71313e80860df818973f5

Consumer_Complaints_FileB Data File:


Consumer_Complaints_FileB file contains consumer complaints data in the XML format and file
encoding is UTF-8. File’s XML elements indentations are not formatted properly. There is a parent tag
<consumerComplaints> contains multiple child elements <complaint> with required attributes “id” and
optional attribute “submissionType”. Complaint element’s attribute “Id” holds unique complaint
number that separate each consumer complaints and make it accessible randomly using complaint
number. submissionType attribute is optional and contains in which way consumer’s complaint was
submitted.

Complaint elements has multiple child elements with and without attribute that holds specific
information about consumer’s complaints like event received, event sent, product type, issue, company
name, company address, response received etc.

Here are details of XML file XML Element and attributes Contents
property: #
1 consumerComplaints This is parent node in file that
hold all complaints data
2 complaint Hold specific data of a
consumer complaint
3 id It’s a unique id of a complaint
4 submissionType Information about medium of
complain submission
5 Event This element contains the
state of complaint
6 type Attribute of Event element
denotes state of complain
7 date Attribute of Event element
contains event state’s date
8 product Information about product
9 productType Child element of Product and
hold information of product
type
10 subproduct Hold information if it’s
subproduct of any product
11 issue Details of issue
12 issueType Contains information of type
of issue to file complaint
13 consumerNarrative Holds details of consumer’s
complain
14 company Information about company
15 companyName It holds company name
16 companyState It hold state information in
which state company is
located
18 companyZip City zip code of company
19 via By which medium complain
submitted Referral | Web |
Phone
20 response Details of response of
complaint
21 timely It denotes if response
received timely or not Y | N
22 consumerDisputed Information whether
consumer disputed or not Y |
N
23 publicResponse Information if company made
any public announcement
24 responseType It holds information of
decision or current state of
complaint

Consumer_Complaints_FileB.xml MD5 checksum # e03bc3837fe3f937b782294d366cc62b

Part -1 (Step - 2)
DTD file size is large including DOCTYPE and XML elements that’s why only included DOCTYPE and
validation results of DTD files. Included final DTD file (Consumer_Complaints_FileA.xml and
Consumer_Complaints_FileB.xml) with DOCTYPE and XML elements in the main folder with this
document for your reference.
Consumer_Complaints_FileA.xml
<?xml version="1.0"?>
<!DOCTYPE consumerComplaints [
<!ELEMENT consumerComplaints (complaint)+ >
<!ELEMENT complaint
((company|consumerNarrative|event|issue|product|submitted)+,response
)>
<!ATTLIST complaint id CDATA #REQUIRED>
<!ELEMENT company (companyName,companyState,companyZip)>
<!ELEMENT consumerNarrative (#PCDATA)>
<!ELEMENT event EMPTY>
<!ATTLIST event
date NMTOKEN #REQUIRED
type (received|sentToCompany) #REQUIRED>
<!ELEMENT issue (issueType,subissue?)>
<!ELEMENT product (productType,subproduct?)>
<!ELEMENT submitted EMPTY>
<!ATTLIST submitted via (Referral|Phone|Web) #REQUIRED>
<!ELEMENT response (publicResponse?,responseType)>
<!ATTLIST response
consumerDisputed (Y|N) #REQUIRED
timely (Y|N) #REQUIRED>
<!ELEMENT companyName (#PCDATA)>
<!ELEMENT companyState (#PCDATA)>
<!ELEMENT companyZip (#PCDATA)>
<!ELEMENT issueType (#PCDATA)>
<!ELEMENT subissue (#PCDATA)>
<!ELEMENT productType (#PCDATA)>
<!ELEMENT subproduct (#PCDATA)>
<!ELEMENT publicResponse (#PCDATA)>
<!ELEMENT responseType (#PCDATA)>
]>
DTD and validation Result –

Consumer_Complaints_FileB.xml
<?xml version="1.0"?>
<!DOCTYPE consumerComplaints [
<!ELEMENT consumerComplaints (complaint)+ >
<!ELEMENT complaint
((company|consumerNarrative|event|issue|product|submitted)+,response
)>
<!ATTLIST complaint
id CDATA #REQUIRED
submissionType (Referral|Phone|Web) #IMPLIED>
<!ELEMENT company (companyName,companyState,companyZip)>
<!ELEMENT consumerNarrative (#PCDATA)>
<!ELEMENT event EMPTY>
<!ATTLIST event
date NMTOKEN #REQUIRED
type (received|sentToCompany) #REQUIRED>
<!ELEMENT issue (issueType,subissue?)>
<!ELEMENT product (productType,subproduct?)>
<!ELEMENT submitted EMPTY>
<!ELEMENT response (publicResponse?,responseType)>
<!ATTLIST response
consumerDisputed NMTOKEN #REQUIRED
timely (yes|no) #IMPLIED>
<!ELEMENT companyName (#PCDATA)>
<!ELEMENT companyState (#PCDATA)>
<!ELEMENT companyZip (#PCDATA)>
<!ELEMENT issueType (#PCDATA)>
<!ELEMENT subissue (#PCDATA)>
<!ELEMENT productType (#PCDATA)>
<!ELEMENT subproduct (#PCDATA)>
<!ELEMENT publicResponse (#PCDATA)>
<!ELEMENT responseType (#PCDATA)>
<!ENTITY redaction "XXXX">
]>

DTD and validation result

Part -1 (Step - 3)
After canonicalization process generated MD5 checksum using link provided in the project details to
generate checksum and finally it’s generated same checksum for both files File_A and File_B. Before
generating checksum, I used notepad++ to compare both files to make sure if any SPACE or TAB left
during canonicalization process. Documented canonicalization process of both files STEP by STEP files in
a separate document (Please check main folder file name “Report - Part1 - Step 6 -
Canonicalization.docx”). Here is check sum of both files:

To generate Checksum URL # https://emn178.github.io/online-tools/md5_checksum.html


Consumer_Complaints_FileA.XML MD5 checksum # 051a637b1acc40faeabd48e4d4ae9a95
Consumer_Complaints_FileB.XML MD5 checksum # 051a637b1acc40faeabd48e4d4ae9a95

Part -1 (Step - 4) DTD of canonicalized document


Canonicalized DTD file size is large including DOCTYPE and XML Elements that’s why only included
DOCTYPE tags. Included final canonicalized DTD files with DOCTYPE and XML elements in the main folder
with this document for your reference.

Consumer_Complaints_canonicalized.dtd
<?xml version="1.0"?>
<!DOCTYPE consumerComplaints [
<!ELEMENT consumerComplaints (complaint)+>
<!ELEMENT complaint
((company|consumerNarrative|event|issue|product
|submitted)+, response)>
<!ATTLIST complaint id CDATA #REQUIRED>
<!ELEMENT company (companyName,companyState,companyZip)>
<!ELEMENT consumerNarrative (#PCDATA)>
<!ELEMENT event EMPTY>
<!ATTLIST event
date NMTOKEN #REQUIRED
type (received|sentToCompany) #REQUIRED>
<!ELEMENT issue (issueType,subissue?)>
<!ELEMENT product (productType,subproduct?)>
<!ELEMENT submitted EMPTY>
<!ATTLIST submitted via (Referral|Phone|Web) #REQUIRED>
<!ELEMENT response (publicResponse?,responseType)>
<!ATTLIST response
consumerDisputed (Y|N) #REQUIRED
timely (Y|N) #REQUIRED>
<!ELEMENT companyName (#PCDATA)>
<!ELEMENT companyState (#PCDATA)>
<!ELEMENT companyZip (#PCDATA)>
<!ELEMENT issueType (#PCDATA)>
<!ELEMENT subissue (#PCDATA)>
<!ELEMENT productType (#PCDATA)>
<!ELEMENT subproduct (#PCDATA)>
<!ELEMENT publicResponse (#PCDATA)>
<!ELEMENT responseType (#PCDATA)>
]>

Part -1 (Step - 5) Validated DTD file using provided link# http://xmlvalidator.new-studio.org/


DTD file size is large including DOCTYPE and XML Elements that’s why after validating DTD using online
link attached screen shot of DTD part only. And included final canonicalized DTD files with DOCTYPE and
XML elements in the main folder with this document for your reference.

You might also like