Professional Documents
Culture Documents
Module: Data Handling and Decision Making Lesson: Regulatory, Legal and Ethical Issues of Big Data
Module: Data Handling and Decision Making Lesson: Regulatory, Legal and Ethical Issues of Big Data
Contents
8.1Lesson introduction
8.3Data sovereignty
https://vimeo.com/207432019/348dc6c7c9
The United States Department of Labor defines Personal Identifiable Information (PII) as:
“Any representation of information that permits the identity of an individual to whom the
information applies to be reasonably inferred by either direct or indirect means. PII is further
defined as:
information that directly identifies an individual (for instance, name, address, social security
number or other identifying number or code, telephone number, email address, and so on),
or
These data elements may include a combination of gender, race, birth date, geographic
indicator, and other descriptors. Additionally, information permitting the physical or online
contacting of a specific individual is the same as personally identifiable information. This
information can be maintained in either paper, electronic or other media.” (DOL, 2017).
In the contemporary society, the protection of PII is a growing issue because of the exponential
increase in the use of online services (Grama, 2014). People need to share their PII to access
their bank accounts, book healthcare services, or buy goods in virtual shops. Users also
disclose information about themselves to a large audience on social networking sites.
Big data analytics needs extensive volumes of information, often personal, to create new
Ethical restrictions
This diagram illustrates the conflict of interests parties involved in big data analytics can face
between what they would like to achieve in terms of legal, technical and business objectives.
The ethical aproach to data handling is aimed at locating a mutually acceptable and beneficial
practice. This practice can be implemented as a range of policies at organisational,
governmental and inter-governmental levels, as will be discussed further in this lesson.
Figure 8.01 - Figure 8.01. Adapted from Chesell (2014) and IBM (2014)
This BBC video will share with you some reflections on the use of personal data by
companies:
https://www.youtube.com/watch?v=naaDBNSx610
Why does Doc Searls argue that we are at an early stage of the evolution of what the
Internet will become?
Personal data is deeply connected to the privacy of an individual. Open data is data that
can be anonymised and useful as an aggregate of many people’s data in order to
ascertain things about the way we live.
At the moment businesses are like infants, their manners are horrible. It would never occur
to anybody in a mature industry to trail somebody out of a store and plant a tracking
beacon on them and say: “don’t worry it isn’t personal, we are only trying to follow you
around and see what you are doing so we can give you a better experience”.
You create your identity by showing different aspects of your personality to the world. If
you have no privacy, you cannot be different to different people.
Learning outcomes
Understand the issues associated with ethics, privacy and security of big data
Explain what data sovereignty is and why it has a direct impact on business performance
Understanding compliance
https://vimeo.com/207432050/daa31c5cbe
In this section, we review major issues arising from data privacy and ethical use of sensitive
personal information. In accordance with the UK Information Commissioner’s Office, sensitive
whether he is a member of a trade union (within the meaning of the Trade Union and
Labour Relations (Consolidation) Act 1992),
any proceedings for any offence committed or alleged to have been committed by him, the
disposal of such proceedings or the sentence of any court in such proceedings.” ICO
(2017).
We discuss the principles, mechanisms and policies an organisation must comply with when
implementing and carrying out a data analytics function. The compliance ensures that any
sensitive information is being held and dealt with in accordance with legal requirements and
best practices related to the processing of sensitive information.
The increase in online activity, ranging from search and purchasing to selling, communication,
hiring and all other forms of Internet transactions, has caused a big surge in data. As Minelli,
Chamber and Dhiraj (2013) point out, every click on the Internet, any new sign up, and each
subscription are causing businesses to lean towards data collection and analytics. This has
brought up a debate about the protection of individuals’ “rights” to information. For example, a
number of businesses currently provide free services, such as a complimentary e-book, in
return for details about the person downloading it. This creates a grey area in terms of rights to
information. When data is given in exchange for free services, the questions such as the
following arise:
What information are users willing to share in return for those services?
Although it is basically a matter of quid pro quo - “a mutual agreement of exchange of value or
products or services” - the grey area lies in its interpretation.
There is not much difference between a digital and a traditional agreement of exchange of
services in return for costs. But traditional agreements allow both parties to negotiate and work
out the details, whereas digital agreements only provide consumers with a yes or no choice.
The problem is that this information surge is giving enormous power to “big data brands”.
They can easily use the details provided by the user to peek into their personal life. As Jeff
Jonas, Chief Scientist at IBM, points out in his blog article, “Using Transparency as a Mark”,
“Big Data makes it harder to keep secrets” (Jonas, 2017).
Figure 8.02 - Figure 8.02. Adapted from Minelli, Chamber and Dhiraj (2013)
Minelli, Chamber and Dhiraj maintain that there is a misconception in the contemporary
environment that big data is a fairly new concept. But, in fact, it has been around for decades.
While companies are now using digital agreements and powerful data analytic tools,
companies in the past employed warranty cards and gained insight into their customers’ lives
through data processing centres. This collection and analysis has definitely become easier for
new companies, but it is a tried and true practice that has been in use for a long period of time.
Traditional data collection and its sale was in the form of brokering, which later evolved into
digital target marketing through the addition of better and more innovative analytical tools.
These allowed for a finer-grain targeted approach, in accordance with users’ behaviour and
information. For example, companies in sectors such as telecommunications, banking, etc. can
now work on more specific marketing and provide services that are tailored to their consumer
needs.
Traditionally, big companies that focused on data analytics paid huge sums to collection
centres or brokers who employed tedious and time-consuming data mining techniques,
algorithms, and behavioural analysis tools that used to take six months or even more. In the
MCI did not only employ the already existing data but they leveraged the collection power of
other sources, i.e. what was gathered from the US Postal Service and partnerships with other
companies that could provide information, such as airlines and credit card companies.
However, credit card data was then dropped due to regulations limiting its use. The
inefficiencies in the traditional methods lie in the cost and time companies spent on purchasing
information from their partners rather than directly asking their consumers to share their
preferences, which also led to inaccuracies.
With the evolution of Customer Relationship Management (CRM), due to a shift towards
database marketing, companies have also shifted their focus from “segmentation to
personalising relationships” on the basis of the principle that “the more you knew about a
consumer, the better you could meet their needs.”
In addition, consumers are not unwilling to share their personal preferences, as they have
become more aware and understand that this enables businesses to improve their
performance and enhance the value of the exchange between service users and providers. In
fact, many are frustrated with companies who do not collect their information because it causes
customers to have to repeat themselves in the course of purchasing services or attempting to
resolve issues. If the mundane example of food delivery services is considered, restaurants
that gather user phone numbers and addresses for future deliveries would be preferred over
delivery places that require consumers to repeat their details every single time they are looking
to purchase.
Principles of privacy
With big data comes big responsibility to be accountable, transparent, and protect individuals’
rights. Therefore, there are seven global principles that should be followed in the realm of data
collection (Minelli, Chamber and Dhiraj, 2013):
Consent: Information disclosed to third parties should be on the basis of agreement of the
above-mentioned notice and choice.
Data integrity: Everything provided should be aligned with its intended use, and be
Accountability: Firms should design accountability principles to comply with the data
protection and usage rules and regulations.
Handling and protecting personal data is a serious issue of our technological society where
information is a relevant economic resource and most of business transactions take place
online (Kshetri, 2014). The objective of personal data regulations is to establish the protection
of citizens’ personal data and regulate the environment for business.
Some of the most prominent examples of data protection regulations are as follows.
This video will show you some interviews from the IEEE Computer Society Conference on Big
Data & the Cloud: Privacy and Security issues:
https://www.youtube.com/watch?v=Qx4JhSklbJc
What, according to the interviewees, will be the effect of the big data revolution on
privacy?
We cannot know precisely. Nowadays, we are on the tip of the iceberg, although providers
of service computing are working hard to keep people’s personal information out of public
exposure.
There are governmental agencies that help organisations develop or guide the compliance
of algorithms that fit the users' data flow protection.
https://vimeo.com/207432093/311254b951
The data sovereignty concept stipulates that digital data must be stored, processed and
archived in the country of its origin and be subject to the respective laws and regulations of that
country.
There is a standard infrastructure for cloud computing all over the world. This provides firms
with economies of scale that help them keep their costs and, consequently, prices as low as
possible. This is also helpful in data travelling to data centres outside the geographic region of
the UK and Europe for cost and redundancy purposes.
Data safeguards
Cloud services have covered the data computing and storage market aggressively. But there
had always been data protection laws in Europe even before the emergence of cloud services.
These laws state that for personal data to get transferred to any other geographic region, the
data-receiving country should have properly demonstrated the right data safeguards. These
There are variations in American and European data protection rights. These differences got
revealed with Edward Snowden and the PRISM programme’s activities becoming public in
2013. It also highlighted the scale of data accumulated from Internet firms of the States and
around the globe.
Firms as clients have now realised that a foreign company providing cloud services to their
customers would operate under its own local laws for data privacy and access. This has
become a threat to European organisations availing of cloud services/applications, since they
are bound to follow foreign laws and so have less control over data. For example, a number of
European countries, such as Germany and France, have put forward the legislation about
personal data requiring it to be stored and maintained in the country. On the other hand,
American multinational companies like Amazon and Microsoft are working hard to facilitate
their customers with an option to store information in any part of the world. However,
organisations should keep in mind that changing the location of data centres only, would not
help them against data protection legislation.
Countries all over the world are learning more and more about data sovereignty laws. They
have started to realise the importance of updating their policies regarding storage, transfer,
backup, encryption and privacy of data. They need to be very familiar with any changes in data
sovereignty laws to succeed with information governance.
In doing so, global organisations are facing hard challenges due to the presence of different
regulatory frameworks. Such diversity has also posed threats for cloud service providers who
are responsible for processing personally identifiable information of their clients.
A recent survey by the UK Institute of Directors (IoD, 2016), reveals a shocking statistic. It
shows that very few organisations know where their data is actually stored. A significant 43%
of the firms do not even have cyber insurance. This is a serious threat to firms since they are
losing control over their biggest asset. They need to become increasingly vigilant and proactive
towards their data access and storage. Having data held in the in-country data centre is not
sufficient for a comprehensive policy. Companies need to be sensitive enough towards data
sovereignty in order to avoid the risk of data loss and/or non-compliance.
Simpson (2016) further suggests that the following questions should be asked from service
providers by a company before adopting a cloud-based service.
A company must be aware of the service provider’s headquarters and location where the
business is registered. Since data access and storage are dependent on the privacy laws
of the provider’s country, European clients have to think twice about their data security. A
clever data governance and compliance strategy would enable a company to select
different cloud services via different service providers. For instance, a user while spinning
up a development environment with dummy data might not need to be bothered too much
about where the data is going to be processed. On the contrary, compliance issues would
come into the scene for the regulation of a CRM system and/or any type of confidential
information.
Organisations must have strict and well-developed authorisation policies for the data they
hold. For example, consider the concept of ‘follow-the-sun data centre’, which is getting
widespread nowadays and is opted for by some Managed Security Services Providers
(MSSPs). This concept works on the principle of providing a 24/7 service to clients where
data centres rotate workloads among their staff in different geographic locations. For
example, a set of data being physically stored onshore in the UK and/or near shore in
Europe can be accessed and monitored by Asian analysts when the British workforce have
finished their shifts. This gives rise to variant data protection laws and regulations due to
the change in jurisdiction and boundaries.
Processing of data is easy but what if the data is not backed-up in the physical location
where it is processed? Organisations must show concern about the query. Most of the
cloud service providers do not keep backup copies of data in the same geographic location
where data is processed. This poses a threat to data since different countries have their
own data protection rules and policies.
Data safety laws properly demonstrate data encryption standards for both data-in-transit
(data traversing the Internet) and data-at-rest (physical storage where the data is stored),
which all service providers have to abide by. Organisations must double check that their
chosen cloud service providers conform to these criteria. Data may be encrypted via https
(secure web service) or virtual private networks while traversing the Internet. Organisations
must be informed about the digital form in which their inactive data would be stored.
Whether offsite backups, hard drives, archives or tapes, service providers must be able to
provide all storage details and their approach towards encryption of cloud storage to their
clients.
Impact on business
Knowing that information saved off-shore is a high degree risk, organisations frequently find
themselves in a situation where they still have no other option. On the other hand, a business
must adopt CES (Cyber Essentials Scheme) in order to tender for Government business. CES,
while promoting cloud security principles, clearly states that Government data should not move
off-shore. This way numerous business opportunities are affected by decisions related to data
sovereignty.
These rules reflect the key elements of a proper IT security strategy concerned with data
sovereignty (Simpson, 2016).
A practical data governance strategy should be built and information regarding data
storage, backup and encryption should be gathered periodically
Data governance must be in compliance with the cyber security maturity model
Commitment to cyber maturity must be demonstrated with right accreditations from Cyber
Essentials to Public Service Network (PSN) Accreditation
Proper and certified MSSPs must be selected in order to benefit from a particular data
protection regulation.
https://vimeo.com/207432130/936ddde628
In the context of this video, ANOVA testing will be explored as a one-way analysis of variance.
In this form there is one independent and one dependent variable. The independent variable is
a categorical variable whilst the dependent variable is a continuous variable. This test allows
for comparison of results in a continuous variable to be compared to a categorical variable and
for significances in differences to be demonstrated. The post-hoc analysis is used to show
where differences exist if there are more than two categories in the categorical variable,
overcoming the weakness identified in the Chi-square test. This video will explore how to
identify the right sort of data within a dataset upon which to conduct this test and how to
programme it into SPSS.
Make use of the same working dataset from the previous video.
In their paper, Sheedy and Kumaraguru (2008) discuss an approach to evaluating people’s
privacy preferences. Read this paper and answer the following questions:
What are the two types of cultures identified by Hofstede and how do these relate to the
concept of privacy?
It has been observed that no single type of information is considered personal in all
situations. For example, the information one would give to a retailer may differ from that
which one would give to a marketer.
They are: 1) collectivism, that uses ‘we’ as a major source of identity; 2) individualism,
that is based on the individuals’ independence. Hofstede developed the Individualism
Index (IDV), which measures how collectivist or individualist a society is. The author
suggests that IDV may have an impact on contextual integrity.
Lesson summary
The current big data growth brings up new issues about the safeguard of individuals’
privacy.
Data sovereignty is an emergent concept since countries have different regulations for
digital data protection.
The question of data sovereignty is not only about where your data are stored. In fact, your
data may be physically held in the UK but the people accessing it may be in a completely
different jurisdiction which is not governed by the UK protection regulations.
Usually, digital product or service agreements do not provide any possibility to negotiate;
the customer has only one choice: either accept the terms and conditions as stated or
forego the service.
Reference list
Beck, E.J., Gil, W. & De Lay P.R., 2016. Protecting the confidentiality and security of personal
DOL 2017. The United States Department of Labor. Guidance on the Protection of Personal
Identifiable Information. [online]. Available at: https://www.dol.gov/general/ppii [Accessed 20
February 2017].
Chessell, M., 2014. Ethics for big data and analytics. [IBM]. [online]. Available at:
http://www.ibmbigdatahub.com/sites/default/files/whitepapers_reports_file/TCG%20Stud...
[Accessed 26 February 2017].
Grama, J.L., 2014. Legal issues in information security. Jones & Bartlett Publishers.
IBM 2014. Ethics for big data and analytics. [IBM]. [online]. Available at:
http://www.ibmbigdatahub.com/whitepaper/ethics-big-data-and-analytics [Accessed 20
February 2017].
ICO 2017. Key definitions of the Data Protection Act. [online]. Information Commissioner’s
Office. Available at: https://ico.org.uk/for-organisations/guide-to-data-protection/key-
definitions/ [Accessed 20 February 2017].
IoD 2016. Cyber security underpinning the digital economy. [online]. The Institute of Directors
Report. Available at: https://www.iod.com/events-community/regions/north-
west/news/details/Cyber-security-... [Accessed 20 February 2017].
Jonas, J., 2017. Using Transparency as a Mask. IBM Innovation explanations. [online].
Available at: http://www.ibm.com/thought-
leadership/innovation_explanations/article/jeff_jonas.htm... [Accessed 20 February 2017].
Kshetri, N., 2014. Big data?s impact on privacy, security and consumer welfare.
Telecommunications Policy, 38 (11), pp. 1134-1145.
Minelli, M., Chamber, M. & Dhiraj, A., 2013. Big Data, Big Analytics: Emerging Business
Intelligence and Analytic Trends for Today's Businesses. John Wiley & Sons.
Sheedy, C. & Kumaraguru, P., 2008. A Contextual Method for Evaluating Privacy Preferences,
IFIP International Federation for Information Processing, Vol. 261, Policies and Research in
Identity Management. Springer, pp. 139-146.
Simpson, S., 2016. Data sovereignty: Keeping your data close and your critical data closer.
[online]. IP Expo Europe News. Available at: http://www.ipexpoeurope.com/News/Data-
sovereignty-Keeping-your-data-close-and-your-c... [Accessed 20 February 2017].
Sokolova, M. & Matwin, S., 2016. Personal privacy protection in time of big data. In:
Challenges in Computational Statistics and Data Mining (pp. 365-380). Springer International
Publishing.
Additional reading
http://lawdigitalcommons.bc.edu/cgi/viewcontent.cgi?article=3351&context=bclr [Accessed 20
February 2017].
Wu, X., Zhu, X., Wu, G. Q. & Ding, W., 2014. Data mining with big data. IEEE transactions on
knowledge and data engineering, 26 (1), pp. 97-107. Available at:
http://lansainformatics.com/wp-content/plugins/project-mgt/file/upload/pdf/2440Data-...
[Accessed 20 February 2017].