Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

Module: Data Handling and Decision Making

Lesson: Regulatory, legal and ethical issues of big data

© 2017 Arden University Ltd. All rights reserved.


Arden University Limited reserves all rights of copyright and all other intellectual property rights in these learning materials. No part of
any learning materials may be reproduced, stored in a retrieval system or transmitted in any form or by any means, including
without limitation electronic, mechanical, photocopying, recording or otherwise, without the prior written consent of Arden
University Limited.
Regulatory, legal and ethical issues of big data

Contents

In this lesson, we will discuss the following:

8.1Lesson introduction

8.2Data ethics, privacy and compliance

8.3Data sovereignty

8.4Further practical data analysis techniques: analysis of variance in SPSS.

8.1 Lesson introduction

Personal Identifiable Information

Please watch the video presentation of Section 8.1

https://vimeo.com/207432019/348dc6c7c9

Transcript of Section 08.01 Presentation

The United States Department of Labor defines Personal Identifiable Information (PII) as:

“Any representation of information that permits the identity of an individual to whom the
information applies to be reasonably inferred by either direct or indirect means. PII is further
defined as:

information that directly identifies an individual (for instance, name, address, social security
number or other identifying number or code, telephone number, email address, and so on),
or

information by which an agency intends to identify specific individuals in conjunction with


other data elements, for example, indirect identification.

These data elements may include a combination of gender, race, birth date, geographic
indicator, and other descriptors. Additionally, information permitting the physical or online
contacting of a specific individual is the same as personally identifiable information. This
information can be maintained in either paper, electronic or other media.” (DOL, 2017).

In the contemporary society, the protection of PII is a growing issue because of the exponential
increase in the use of online services (Grama, 2014). People need to share their PII to access
their bank accounts, book healthcare services, or buy goods in virtual shops. Users also
disclose information about themselves to a large audience on social networking sites.

Big data analytics needs extensive volumes of information, often personal, to create new

© 2017 Arden University Ltd. ALl rights reserved


predictive models, and therefore it inevitably raises the issue of personal data protection
(Sokolova & Matwin, 2016). Personal health information is particularly at risk, especially that
being collected in low- and middle-income countries (Beck et al.,2016).

Ethical restrictions

This diagram illustrates the conflict of interests parties involved in big data analytics can face
between what they would like to achieve in terms of legal, technical and business objectives.
The ethical aproach to data handling is aimed at locating a mutually acceptable and beneficial
practice. This practice can be implemented as a range of policies at organisational,
governmental and inter-governmental levels, as will be discussed further in this lesson.

Figure 8.01 - Figure 8.01. Adapted from Chesell (2014) and IBM (2014)

Activity 8.1 Opening discussion

This BBC video will share with you some reflections on the use of personal data by
companies:

https://www.youtube.com/watch?v=naaDBNSx610

Reflect on this video and answer the following questions:

What is the difference between personal and open data?

Why does Doc Searls argue that we are at an early stage of the evolution of what the
Internet will become?

© 2017 Arden University Ltd. ALl rights reserved


What can be the big data risk to the personal identity of an individual?

Post your responses on the forum.

Personal data is deeply connected to the privacy of an individual. Open data is data that
can be anonymised and useful as an aggregate of many people’s data in order to
ascertain things about the way we live.

At the moment businesses are like infants, their manners are horrible. It would never occur
to anybody in a mature industry to trail somebody out of a store and plant a tracking
beacon on them and say: “don’t worry it isn’t personal, we are only trying to follow you
around and see what you are doing so we can give you a better experience”.

You create your identity by showing different aspects of your personality to the world. If
you have no privacy, you cannot be different to different people.

Learning outcomes

As a result of this lesson, you will be able to:

Understand the issues associated with ethics, privacy and security of big data

Explain what data sovereignty is and why it has a direct impact on business performance

Master further inferential statistics in SPSS.

8.2 Data ethics, privacy and compliance

Understanding compliance

Please watch the video presentation of Section 8.2

https://vimeo.com/207432050/daa31c5cbe

Transcript of Section 08.02 Presentation

In this section, we review major issues arising from data privacy and ethical use of sensitive
personal information. In accordance with the UK Information Commissioner’s Office, sensitive

© 2017 Arden University Ltd. ALl rights reserved


personal data includes “information as to

the racial or ethnic origin of the data subject,

his political opinions,

his religious beliefs or other beliefs of a similar nature,

whether he is a member of a trade union (within the meaning of the Trade Union and
Labour Relations (Consolidation) Act 1992),

his physical or mental health or condition,

his sexual life,

the commission or alleged commission by him of any offence, or

any proceedings for any offence committed or alleged to have been committed by him, the
disposal of such proceedings or the sentence of any court in such proceedings.” ICO
(2017).

We discuss the principles, mechanisms and policies an organisation must comply with when
implementing and carrying out a data analytics function. The compliance ensures that any
sensitive information is being held and dealt with in accordance with legal requirements and
best practices related to the processing of sensitive information.

Data privacy and ethics

The increase in online activity, ranging from search and purchasing to selling, communication,
hiring and all other forms of Internet transactions, has caused a big surge in data. As Minelli,
Chamber and Dhiraj (2013) point out, every click on the Internet, any new sign up, and each
subscription are causing businesses to lean towards data collection and analytics. This has
brought up a debate about the protection of individuals’ “rights” to information. For example, a
number of businesses currently provide free services, such as a complimentary e-book, in
return for details about the person downloading it. This creates a grey area in terms of rights to
information. When data is given in exchange for free services, the questions such as the
following arise:

Where is the line in data privacy drawn?

What information are users willing to share in return for those services?

Which protection mechanisms should companies opt in?

Although it is basically a matter of quid pro quo - “a mutual agreement of exchange of value or
products or services” - the grey area lies in its interpretation.

There is not much difference between a digital and a traditional agreement of exchange of
services in return for costs. But traditional agreements allow both parties to negotiate and work
out the details, whereas digital agreements only provide consumers with a yes or no choice.

© 2017 Arden University Ltd. ALl rights reserved


They can either accept the terms and conditions or forego the service. The question that arises
here is an ethical one regarding the use of the information that is provided. Consumers may
not accept the terms and conditions, knowing that their data could be employed to gain further
insights into their lives.

The problem is that this information surge is giving enormous power to “big data brands”.
They can easily use the details provided by the user to peek into their personal life. As Jeff
Jonas, Chief Scientist at IBM, points out in his blog article, “Using Transparency as a Mark”,
“Big Data makes it harder to keep secrets” (Jonas, 2017).

The privacy landscape

The privacy landscape has four major constituents, as shown here.

Figure 8.02 - Figure 8.02. Adapted from Minelli, Chamber and Dhiraj (2013)

Evolution of personal data gathering

Minelli, Chamber and Dhiraj maintain that there is a misconception in the contemporary
environment that big data is a fairly new concept. But, in fact, it has been around for decades.
While companies are now using digital agreements and powerful data analytic tools,
companies in the past employed warranty cards and gained insight into their customers’ lives
through data processing centres. This collection and analysis has definitely become easier for
new companies, but it is a tried and true practice that has been in use for a long period of time.

Traditional data collection and its sale was in the form of brokering, which later evolved into
digital target marketing through the addition of better and more innovative analytical tools.
These allowed for a finer-grain targeted approach, in accordance with users’ behaviour and
information. For example, companies in sectors such as telecommunications, banking, etc. can
now work on more specific marketing and provide services that are tailored to their consumer
needs.

Traditionally, big companies that focused on data analytics paid huge sums to collection
centres or brokers who employed tedious and time-consuming data mining techniques,
algorithms, and behavioural analysis tools that used to take six months or even more. In the

© 2017 Arden University Ltd. ALl rights reserved


mid-1990s, the concept of data warehousing gained popularity, making it more feasible for
certain companies to “roll their own”. Companies, such as MCI, started importing their
information to their own warehouse at the same price as their brokers. They processed data for
almost 300 million individuals, just in their “friends and family” database, updated it every
month instead of every six months, and cut their outsourcing costs. This not only provided
companies with a new gathering approach, but also indicated that call centres were more
accurate and the tools, such as algorithm scores, used by outsourced vendors, were flawed.

MCI did not only employ the already existing data but they leveraged the collection power of
other sources, i.e. what was gathered from the US Postal Service and partnerships with other
companies that could provide information, such as airlines and credit card companies.
However, credit card data was then dropped due to regulations limiting its use. The
inefficiencies in the traditional methods lie in the cost and time companies spent on purchasing
information from their partners rather than directly asking their consumers to share their
preferences, which also led to inaccuracies.

Personalisation and preferences

With the evolution of Customer Relationship Management (CRM), due to a shift towards
database marketing, companies have also shifted their focus from “segmentation to
personalising relationships” on the basis of the principle that “the more you knew about a
consumer, the better you could meet their needs.”

In addition, consumers are not unwilling to share their personal preferences, as they have
become more aware and understand that this enables businesses to improve their
performance and enhance the value of the exchange between service users and providers. In
fact, many are frustrated with companies who do not collect their information because it causes
customers to have to repeat themselves in the course of purchasing services or attempting to
resolve issues. If the mundane example of food delivery services is considered, restaurants
that gather user phone numbers and addresses for future deliveries would be preferred over
delivery places that require consumers to repeat their details every single time they are looking
to purchase.

Principles of privacy

With big data comes big responsibility to be accountable, transparent, and protect individuals’
rights. Therefore, there are seven global principles that should be followed in the realm of data
collection (Minelli, Chamber and Dhiraj, 2013):

Notice (transparency): Individuals should be aware of the purpose of the information


collected.

Choice: Consumers need to be provided an opportunity to disclose or not disclose their


information, and also on how that data could be used.

Consent: Information disclosed to third parties should be on the basis of agreement of the
above-mentioned notice and choice.

Security: Companies should take responsible measures to protect consumer personal


information from alteration, destruction, misuse, loss, unauthorised disclosure and access.

Data integrity: Everything provided should be aligned with its intended use, and be

© 2017 Arden University Ltd. ALl rights reserved


accurate, complete and correct.

Access: Individuals should have access to their personal data.

Accountability: Firms should design accountability principles to comply with the data
protection and usage rules and regulations.

Data protection regulatory frameworks

Handling and protecting personal data is a serious issue of our technological society where
information is a relevant economic resource and most of business transactions take place
online (Kshetri, 2014). The objective of personal data regulations is to establish the protection
of citizens’ personal data and regulate the environment for business.

Some of the most prominent examples of data protection regulations are as follows.

The UK Data Protection Act (DPA): https://ico.org.uk/media/for-


organisations/documents/1541/big-data-and-data-protecti...

The EU General Data Protection Regulation (GDPR): http://ec.europa.eu/justice/data-


protection/reform/index_en.htm

The US Health Insurance Portability and Accountability Act of 1996 (HIPAA):


https://www.hhs.gov/hipaa/index.html

Activity 8.2 Problem analysis

This video will show you some interviews from the IEEE Computer Society Conference on Big
Data & the Cloud: Privacy and Security issues:

https://www.youtube.com/watch?v=Qx4JhSklbJc

Watch this video and answer the following related questions.

What is the creepy factor reported by Michael Goul?

What, according to the interviewees, will be the effect of the big data revolution on
privacy?

Are there examples on protecting the user data flow?

Please post your answers on the discussion forum.

© 2017 Arden University Ltd. ALl rights reserved


It concerns gathering and exploiting interactions of unaware customers. It should not cross
the security threshold.

We cannot know precisely. Nowadays, we are on the tip of the iceberg, although providers
of service computing are working hard to keep people’s personal information out of public
exposure.

There are governmental agencies that help organisations develop or guide the compliance
of algorithms that fit the users' data flow protection.

8.3 Data sovereignty

Please watch the video presentation of Section 8.3

https://vimeo.com/207432093/311254b951

Transcript of Section 08.03 Presentation

The data sovereignty concept stipulates that digital data must be stored, processed and
archived in the country of its origin and be subject to the respective laws and regulations of that
country.

As emphasised by S. Simpson, CEO of an independent UK-based cyber security consultancy,


no matter how large or small, organisations these days ought to be very much concerned
about their data protection policies. They should be able to answer any questions regarding
data sovereignty as a key part of their IT governance strategy. These firms should be extra
careful about data storage and management while using various applications, clouds and/or
other data backup services. This will not only help organisations to stay updated with
contemporary data protection laws but also ensure compliance with current and emerging data
security legislation (Simpson, 2016). In this section, we are considering the main principles and
issues associated with the concept of data sovereignty.

There is a standard infrastructure for cloud computing all over the world. This provides firms
with economies of scale that help them keep their costs and, consequently, prices as low as
possible. This is also helpful in data travelling to data centres outside the geographic region of
the UK and Europe for cost and redundancy purposes.

Data safeguards

Cloud services have covered the data computing and storage market aggressively. But there
had always been data protection laws in Europe even before the emergence of cloud services.
These laws state that for personal data to get transferred to any other geographic region, the
data-receiving country should have properly demonstrated the right data safeguards. These

© 2017 Arden University Ltd. ALl rights reserved


safety measures are developed for data protection, but the definition of ‘right’ is blurred.

There are variations in American and European data protection rights. These differences got
revealed with Edward Snowden and the PRISM programme’s activities becoming public in
2013. It also highlighted the scale of data accumulated from Internet firms of the States and
around the globe.

Firms as clients have now realised that a foreign company providing cloud services to their
customers would operate under its own local laws for data privacy and access. This has
become a threat to European organisations availing of cloud services/applications, since they
are bound to follow foreign laws and so have less control over data. For example, a number of
European countries, such as Germany and France, have put forward the legislation about
personal data requiring it to be stored and maintained in the country. On the other hand,
American multinational companies like Amazon and Microsoft are working hard to facilitate
their customers with an option to store information in any part of the world. However,
organisations should keep in mind that changing the location of data centres only, would not
help them against data protection legislation.

Data governance questions

Countries all over the world are learning more and more about data sovereignty laws. They
have started to realise the importance of updating their policies regarding storage, transfer,
backup, encryption and privacy of data. They need to be very familiar with any changes in data
sovereignty laws to succeed with information governance.

In doing so, global organisations are facing hard challenges due to the presence of different
regulatory frameworks. Such diversity has also posed threats for cloud service providers who
are responsible for processing personally identifiable information of their clients.

A recent survey by the UK Institute of Directors (IoD, 2016), reveals a shocking statistic. It
shows that very few organisations know where their data is actually stored. A significant 43%
of the firms do not even have cyber insurance. This is a serious threat to firms since they are
losing control over their biggest asset. They need to become increasingly vigilant and proactive
towards their data access and storage. Having data held in the in-country data centre is not
sufficient for a comprehensive policy. Companies need to be sensitive enough towards data
sovereignty in order to avoid the risk of data loss and/or non-compliance.

Simpson (2016) further suggests that the following questions should be asked from service
providers by a company before adopting a cloud-based service.

Where is the data going to be stored?

A company must be aware of the service provider’s headquarters and location where the
business is registered. Since data access and storage are dependent on the privacy laws
of the provider’s country, European clients have to think twice about their data security. A
clever data governance and compliance strategy would enable a company to select
different cloud services via different service providers. For instance, a user while spinning
up a development environment with dummy data might not need to be bothered too much
about where the data is going to be processed. On the contrary, compliance issues would
come into the scene for the regulation of a CRM system and/or any type of confidential
information.

© 2017 Arden University Ltd. ALl rights reserved


2. Who can access the data?

Organisations must have strict and well-developed authorisation policies for the data they
hold. For example, consider the concept of ‘follow-the-sun data centre’, which is getting
widespread nowadays and is opted for by some Managed Security Services Providers
(MSSPs). This concept works on the principle of providing a 24/7 service to clients where
data centres rotate workloads among their staff in different geographic locations. For
example, a set of data being physically stored onshore in the UK and/or near shore in
Europe can be accessed and monitored by Asian analysts when the British workforce have
finished their shifts. This gives rise to variant data protection laws and regulations due to
the change in jurisdiction and boundaries.

3. What is a data backup policy?

Processing of data is easy but what if the data is not backed-up in the physical location
where it is processed? Organisations must show concern about the query. Most of the
cloud service providers do not keep backup copies of data in the same geographic location
where data is processed. This poses a threat to data since different countries have their
own data protection rules and policies.

4. What about data encryption?

Data safety laws properly demonstrate data encryption standards for both data-in-transit
(data traversing the Internet) and data-at-rest (physical storage where the data is stored),
which all service providers have to abide by. Organisations must double check that their
chosen cloud service providers conform to these criteria. Data may be encrypted via https
(secure web service) or virtual private networks while traversing the Internet. Organisations
must be informed about the digital form in which their inactive data would be stored.
Whether offsite backups, hard drives, archives or tapes, service providers must be able to
provide all storage details and their approach towards encryption of cloud storage to their
clients.

Impact on business

Knowing that information saved off-shore is a high degree risk, organisations frequently find
themselves in a situation where they still have no other option. On the other hand, a business
must adopt CES (Cyber Essentials Scheme) in order to tender for Government business. CES,
while promoting cloud security principles, clearly states that Government data should not move
off-shore. This way numerous business opportunities are affected by decisions related to data
sovereignty.

Data compliance strategy

These rules reflect the key elements of a proper IT security strategy concerned with data
sovereignty (Simpson, 2016).

A practical data governance strategy should be built and information regarding data
storage, backup and encryption should be gathered periodically

© 2017 Arden University Ltd. ALl rights reserved


A clear understanding must be developed about authorisation, administration and access
of data and the form of reporting to be received as the client

Data governance must be in compliance with the cyber security maturity model

Commitment to cyber maturity must be demonstrated with right accreditations from Cyber
Essentials to Public Service Network (PSN) Accreditation

Proper and certified MSSPs must be selected in order to benefit from a particular data
protection regulation.

Activity 8.3 Knowledge Check

8.4 Further practical data analysis techniques: analysis of variance in SPSS

Please watch the video presentation of Section 8.4

https://vimeo.com/207432130/936ddde628

ANOVA testing to include Tukey’s post-hoc analysis

In the context of this video, ANOVA testing will be explored as a one-way analysis of variance.
In this form there is one independent and one dependent variable. The independent variable is
a categorical variable whilst the dependent variable is a continuous variable. This test allows
for comparison of results in a continuous variable to be compared to a categorical variable and
for significances in differences to be demonstrated. The post-hoc analysis is used to show
where differences exist if there are more than two categories in the categorical variable,
overcoming the weakness identified in the Chi-square test. This video will explore how to
identify the right sort of data within a dataset upon which to conduct this test and how to
programme it into SPSS.

Make use of the same working dataset from the previous video.

Activity 8.4 Case study

In their paper, Sheedy and Kumaraguru (2008) discuss an approach to evaluating people’s
privacy preferences. Read this paper and answer the following questions:

What are the fundamental concepts underlying the contextual integrity?

Is specific information to be deemed as personal in all situations?

What are the two types of cultures identified by Hofstede and how do these relate to the
concept of privacy?

Post your responses on the forum.

© 2017 Arden University Ltd. ALl rights reserved


They are: 1) contexts, that concern the privacy expectation an individual associates to a
social context, e.g. a university or a hospital; information is considered sensitive with
respect to a certain context; 2) norms, that state the accepted ways in which information
may be passed; for example, a student may accept their examination results being known
within their academic department but not outside the university.

It has been observed that no single type of information is considered personal in all
situations. For example, the information one would give to a retailer may differ from that
which one would give to a marketer.

They are: 1) collectivism, that uses ‘we’ as a major source of identity; 2) individualism,
that is based on the individuals’ independence. Hofstede developed the Individualism
Index (IDV), which measures how collectivist or individualist a society is. The author
suggests that IDV may have an impact on contextual integrity.

Lesson summary

The current big data growth brings up new issues about the safeguard of individuals’
privacy.

Data sovereignty is an emergent concept since countries have different regulations for
digital data protection.

The question of data sovereignty is not only about where your data are stored. In fact, your
data may be physically held in the UK but the people accessing it may be in a completely
different jurisdiction which is not governed by the UK protection regulations.

Usually, digital product or service agreements do not provide any possibility to negotiate;
the customer has only one choice: either accept the terms and conditions as stated or
forego the service.

The objective of personal data regulations is to establish the protection of citizens’


personal data and regulate the environment for business. However, it has been observed
that no single type of information is considered personal in all situations.

Reference list

Beck, E.J., Gil, W. & De Lay P.R., 2016. Protecting the confidentiality and security of personal

© 2017 Arden University Ltd. ALl rights reserved


health information in low-and middle-income countries in the era of SDGs and Big Data.
[online]. Global Health Action, 9. Available at: [Accessed 25 February 2017].

DOL 2017. The United States Department of Labor. Guidance on the Protection of Personal
Identifiable Information. [online]. Available at: https://www.dol.gov/general/ppii [Accessed 20
February 2017].

Chessell, M., 2014. Ethics for big data and analytics. [IBM]. [online]. Available at:
http://www.ibmbigdatahub.com/sites/default/files/whitepapers_reports_file/TCG%20Stud...
[Accessed 26 February 2017].

Grama, J.L., 2014. Legal issues in information security. Jones & Bartlett Publishers.

IBM 2014. Ethics for big data and analytics. [IBM]. [online]. Available at:
http://www.ibmbigdatahub.com/whitepaper/ethics-big-data-and-analytics [Accessed 20
February 2017].

ICO 2017. Key definitions of the Data Protection Act. [online]. Information Commissioner’s
Office. Available at: https://ico.org.uk/for-organisations/guide-to-data-protection/key-
definitions/ [Accessed 20 February 2017].

IoD 2016. Cyber security underpinning the digital economy. [online]. The Institute of Directors
Report. Available at: https://www.iod.com/events-community/regions/north-
west/news/details/Cyber-security-... [Accessed 20 February 2017].

Jonas, J., 2017. Using Transparency as a Mask. IBM Innovation explanations. [online].
Available at: http://www.ibm.com/thought-
leadership/innovation_explanations/article/jeff_jonas.htm... [Accessed 20 February 2017].

Kshetri, N., 2014. Big data?s impact on privacy, security and consumer welfare.
Telecommunications Policy, 38 (11), pp. 1134-1145.

Minelli, M., Chamber, M. & Dhiraj, A., 2013. Big Data, Big Analytics: Emerging Business
Intelligence and Analytic Trends for Today's Businesses. John Wiley & Sons.

Sheedy, C. & Kumaraguru, P., 2008. A Contextual Method for Evaluating Privacy Preferences,
IFIP International Federation for Information Processing, Vol. 261, Policies and Research in
Identity Management. Springer, pp. 139-146.

Simpson, S., 2016. Data sovereignty: Keeping your data close and your critical data closer.
[online]. IP Expo Europe News. Available at: http://www.ipexpoeurope.com/News/Data-
sovereignty-Keeping-your-data-close-and-your-c... [Accessed 20 February 2017].

Sokolova, M. & Matwin, S., 2016. Personal privacy protection in time of big data. In:
Challenges in Computational Statistics and Data Mining (pp. 365-380). Springer International
Publishing.

Additional reading

Data Protection... What you need to know. Available at:

https://www.youtube.com/watch?v=vHvd6HaPq_s [Accessed 20 February 2017].

© 2017 Arden University Ltd. ALl rights reserved


Crawford, K. & Schultz, J., 2014. Big data and due process: Toward a framework to redress
predictive privacy harms. BCL Rev., 55 (93). Available at:

http://lawdigitalcommons.bc.edu/cgi/viewcontent.cgi?article=3351&context=bclr [Accessed 20
February 2017].

Wu, X., Zhu, X., Wu, G. Q. & Ding, W., 2014. Data mining with big data. IEEE transactions on
knowledge and data engineering, 26 (1), pp. 97-107. Available at:

http://lansainformatics.com/wp-content/plugins/project-mgt/file/upload/pdf/2440Data-...
[Accessed 20 February 2017].

The ethics of data - personal data and privacy


https://www.youtube.com/watch?v=naaDBNSx610 [Accessed 02 March 2017].

Big data and data protection https://ico.org.uk/media/for-organisations/documents/1541/big-


data-and-data-protecti... [Accessed 02 March 2017].

Reform of EU data protection rules http://ec.europa.eu/justice/data-


protection/reform/index_en.htm [Accessed 02 March 2017].

Health Information Privacy https://www.hhs.gov/hipaa/index.html [Accessed 02 March 2017].

Big data and the cloud - privacy and security issues


https://www.youtube.com/watch?v=Qx4JhSklbJc [Accessed 02 March 2017].

© 2017 Arden University Ltd. ALl rights reserved


© 2017 Arden University Ltd. ALl rights reserved
© 2017 Arden University Ltd. ALl rights reserved

Powered by TCPDF (www.tcpdf.org)

You might also like