03 - Lecture 3-Database - EV - Privacy

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 42

ISET4005

PRIVACY IN A NETWORKED WORLD


OVERVIEW AND DATABASE

DR. EMIR VAJZOVIĆ (EMIR.VAJZOVIC@ADPOLY.AC.AE)


Information
Security & Privacy

CIA Triade Business Continuity

People Technology

Processes (GRC)

2
WHAT IS PERSONALLY IDENTIFIABLE
INFORMATION (PII)?
• Personally identifiable information (PII) is information that, when used alone or with other relevant data, can identify an
individual.
• PII may contain direct identifiers (e.g., passport information) that can identify a person uniquely, or quasi-identifiers (e.g.,
race) that can be combined with other quasi-identifiers (e.g., date of birth) to successfully recognize an individual.
• KEY TAKEAWAYS
• Personally identifiable information (PII) uses data to confirm an individual's identity.
• Sensitive personally identifiable information can include your full name, Social Security Number, driver’s license, financial
information, and medical records.
• Non-sensitive personally identifiable information is easily accessible from public sources and can include your zip code,
race, gender, and date of birth.
• Passports contain personally identifiable information.
• Social media sites may be considered non-sensitive personally identifiable information.
TOPICS TO BE DISCUSSED - 1.2 DATABASE
PRIVACY - CHAPTER 2
• presents several privacy techniques (such as statistical disclosure control (SDC) methods,
anonymization methods, or sanitization methods) that can be applied to databases. The authors present
an overview of the issues in database privacy, a survey of the best-known SDC methods, a discussion
on the related data privacy/utility trade-offs and a description of privacy models proposed by the
computer science community in recent years. Some relevant freeware packages are also identified.

• A priori and a posteriori approaches to disclosure control in database privacy sanitization have been
reviewed. This chapter looks at sanitization methods, which are common to both approaches, through
the discussions on tabular data, queryable databases, and microdata, with a special focus on the latter.
Finally, research challenges and opportunities have been identified in the area of statistical disclosure
control.
1.3 PRIVACY AND BIG DATA

• Chapter 3 presents a brief review of Big Data technologies, describes the benefits,
and outlines how Big Data has come to harm privacy in subtle new forms. The
chapter investigates privacy issues that have come up due to technological
advancements leading to mostly huge amounts of personal data being stored and
communicated. The chapter then reviews the legal and technological issues and
describes some possible solutions. It further discusses many open research problems
and challenges related to privacy and Big Data. Moreover, this chapter also covers
technology, law, and ethics aspects of Big Data analytics from a non-technical
perspective.
1.4 PRIVACY IN CROWDSOURCED
PLATFORMS
• An overview of privacy in crowdsourcing platforms is given in Chap. 4, with a focus on platforms (such
as the Amazon Mechanical Turk (AMT) platform) that specifically deal with the collection and
aggregation of information. This chapter emphasizes the privacy risks in online systems and discusses
how these risks apply to crowdsourcing platforms, focusing on the potential for exposing Personally
Identifiable Information (PII). These risks are illustrated with an example of a real world attack conducted
through a series of survey tasks in AMT. In addition, the chapter provides an overview of solutions that
can provide privacy protection in online services in general, and identifies those that could also be applied
to crowdsourcing platforms. Furthermore, the chapter includes a specific proposal for a privacy-
preserving crowdsourcing platform that relies on obfuscation, and describes the design choices
surrounding obfuscation techniques, privacy levels, privacy loss quantification, privacy depletion, cost
settings, and utility estimation of workers in crowdsourcing platforms. The chapter describes the
implementation details for a prototype of the system and summarizes the challenges that still need to be
addressed to enhance the privacy of workers in crowdsourcing platforms.
1.5 PRIVACY IN HEALTHCARE
• Privacy of healthcare records has been a major concern for a very long time now.
Various legislations have been put in place to ensure the privacy of patients. Chapter 5
discusses a few electronic healthcare systems that can be classified into a variety of
systems with their own features and faults. The chapter also presents several privacy
concerns related to the storage and transmission of health information, the use of mobile
devices and social media, and the use of cloud storage systems in healthcare.
• Moreover, the chapter discusses the privacy challenges that exist in all of the electronic
health systems and solutions to address these challenges in those systems. Finally, the
chapter highlights future privacy challenges and opportunities related to the
development and deployment of electronic health systems.
1.6 PRIVACY IN PEER-TO-PEER NETWORKS
• Peer-to-peer (P2P) networks are designed to take advantage of dispersed network resources and
enable participants to act as servers or clients; their main characteristics include the direct sharing
of resources among users, their self-organization, stability, and autonomy. As with other systems,
privacy is a major concern in P2P networks. Chapter 6 on privacy in P2P networks starts with an
introduction to P2P networks, their classification, and their characteristics. After presenting a brief
overview of P2P networks, the chapter identifies and analyzes the existing privacy issues when
using P2P networks and the current privacy solutions that can be used.
• These solutions include anonymous systems, routing modifications, protection of contents when
stored and during transmission, private and split credentials, hidden services, and application
configuration and hardening. The chapter further explores the challenges that must be addressed
in the future. It also discusses future research directions.
1.7 PRIVACY IN THE CLOUD
• Cloud computing technologies are being deployed and used by many businesses, governments,
and organizations and are becoming increasingly popular as they offer access to a wide range
of infrastructure resources, very convenient pay-as-you-go service, and low cost computing and
storage. However, the advantages of clouds come with increased security and privacy risks.
• Chapter 7 discusses the need for privacy protection and the confidentiality of data and
applications outsourced to the cloud. The authors present an overview of multi-tenancy and
other inherent properties of the cloud computing model, as well as the novel attack surfaces and
threats to cloud users’ privacy. The chapter also discusses existing approaches for protecting
privacy, and analyzes the pros and cons of these solutions.
• Finally, it outlines a list of open problems and issues which need to be further investigated and
addressed by researchers in the future.
1.8 PRIVACY IN VEHICULAR AD HOC
NETWORKS
• Chapter 8 discusses various privacy issues in vehicular ad hoc networks (VANETs). The chapter starts by
presenting VANET as a new and promising technology that can enhance road safety and provide the
foundation for many possible added value applications and services.
• The chapter then investigates the various security and privacy concerns associated with this technology.
The authors present several approaches aimed at protecting user and vehicle privacy in VANET
communications and also include a discussion of current solutions and their limitations.
• Finally, the chapter discusses a broad range of critical security and privacy challenges currently present in
VANETs which should be investigated in future research works.
• They are understood as having evolved into a broader "Internet of vehicles".[4] which itself is expected to
ultimately evolve into an "Internet of autonomous vehicles"
1.9 PRIVACY LAW AND REGULATION

• Chapter 9 deals with the regulation of personal information disclosure and the privacy
of individuals. It provides an overview of the laws and regulations used to regulate
privacy in the digital age.
• This chapter examines the current state of US laws that have a direct or indirect impact
on the privacy of individuals. The authors of this chapter consider government
surveillance and both the laws that allow it and those aimed at placing restraints on law
enforcement activities. This is followed by an analysis of privacy regulations in the
European Union. The chapter concludes by examining opportunities for change with
respect to privacy laws and regulations.
1.10 PRIVACY IN MOBILE DEVICES

• The ubiquitous use of mobile devices for personal communications, and subsequently for almost all
types of data transactions, has introduced the next level of privacy problems. Chapter 10 includes a
review of on-going efforts aimed at retaining the privacy of users constantly interacting with mobile
devices for most of their daily activities. It presents an overview of mobile devices and their related
technologies. It also highlights the privacy issues associated with the use of a mobile device and
discusses the type of personal data that may be collected by a mobile application and the methods by
which this data may leak to third parties that are not directly authorized by the user.
• The chapter discusses the solutions that can be deployed to mitigate mobile device privacy concerns.
Finally, the chapter ends with a discussion on the challenges we currently face in making a mobile
device a more privacy-aware sensitive platform.
1.11 PRIVACY WITH BIOMETRICS

• Chapter 11 discusses the topic of privacy in biometric systems. Biometrics can be a very
effective tool to keep us safe and secure, prevent individuals from applying for multiple
passports or diving licenses, and keep the bad guys out or under control.
• However, the fact that we are surrounded by so many biometric sensors does limit our privacy
in one way or another. This chapter is mainly concerned with privacy issues and solutions
surrounding the use of biometrics for recognizing individuals.
• It provides an adequate background on biometrics and discusses several privacy concerns and
solutions about biometrics. The chapter ends with a discussion of some of the outstanding
challenges and opportunities in the area of privacy with biometrics.
1.12 PRIVACY IN SOCIAL NETWORKS

• Social networks such as Facebook and LinkedIn have gained a lot of popularity in
recent years. These networks use a large amount of data that are highly valuable for
different purposes. Hence, social networks become a potential vector for attackers to
exploit. Chapter 12 focuses on the security attacks and countermeasures used by
social networks.
• Privacy issues and solutions in social networks are discussed and the chapter ends
with an outline of some of the privacy challenges in the social networks.
• The Social Dilemma
1.13 THE RIGHT TO PRIVACY IN THE AGE OF
DIGITAL TECHNOLOGY
• Chapter 13 reviews some of the privacy issues that have arisen as a result of the emergence and
proliferation of digital information networks. It presents a brief overview of the threats posed to
personal privacy, especially for vulnerable groups such as the consumer or users of social media
to better understand the nature and scope of the challenges presented by evolving information
technologies such as social networking platforms. The author then analyzes several theories of
privacy and justifications for privacy, the right to privacy, and how to protect this right.
• The chapter concludes by describing both the law and technological tools to secure privacy
rights.
• The Great Hack
1.14 HOW TO EXPLORE CONSUMERS’ PRIVACY
CHOICES
WITH BEHAVIORAL ECONOMICS
• Chapter 14 describes the tools and the evidence to better understand consumers’ privacy behaviors. The
tools discussed will be useful to researchers, practitioners, and policy makers in the area of consumer
privacy. The author presents interesting results about surveying/testing privacy-related behavior of
individuals during electronic communications with a particular focus on e-services. The chapter also
outlines the principles of conducting empirical research on consumers’ privacy consumption behaviors.
Explanation is given as to why experiments rather than surveys or hypothetical choices are needed for
delivering valid insights to decision makers. After reviewing the existing empirical evidence about the
importance that consumers attach to their privacy, the chapter explains the methodological requirements
of valid privacy experiments and offers practical advice for conducting privacy choice experiments. This
chapter provides a good insight into privacy-enhancing solutions and policies that meet consumers’
needs.
1.15 TECHNIQUES, TAXONOMY, AND
CHALLENGES OF PRIVACY PROTECTION IN
SMART GRID
• The deployment of Smart Grid technologies has also raised considerable concerns in data
privacy issues of Smart Grid users. Privacy concerns in the Smart Grid environment are
mostly related to the collection and use of energy consumption data of Smart Grid users.
• In this context, Chapter 15 discusses various Smart Grid privacy issues and presents Smart
Grid privacy protection architectures and approaches. The authors provide a unique taxonomy
of the different privacy protection mechanisms that have been recently proposed in the
literature. Various strengths and weaknesses of these privacy solutions are also identified.
• Finally, the chapter discusses some outstanding challenges that need to be addressed to
provide robust and scalable privacy protection solutions to Smart Grid users.
1.16 LOCATION-BASED PRIVACY, PROTECTION,
SAFETY,
AND SECURITY
• One of the major benefits of location-based services (LBS) is their ability to maintain safety and
security. But LBS can also result in risks such as the use of LBS for cyber stalking others. To
establish the need for LBS regulation, we need to understand that there will always be a trade-off
between LBS’s benefits and the risks associated with their implementation and adoption. Chapter 16
examines privacy and security issues with respect to LBS and recognizes the need for technological
solutions, in addition to commitments and adequate assessments/ considerations at the social and
regulatory levels. The authors discuss various solutions that have been recently proposed in the area
of location-based privacy and identify the various strengths and weaknesses of these solutions.
• The chapter concludes with a list of interesting challenges relevant to privacy in LBS and the need
for further investigation on issues associated with mobility and location technologies.
THE RESPONSE AND
SOLUTIONS
PLANNING FOR PRIVACY

• Yes, you do need a plan


• No, there isn’t a single solution
• Why a framework is essential
• It defines a set of parameters in which privacy policies, procedures,
practices, and technology can be implemented, supported and audited.
THE PRIVACY POLICY

• The Privacy Policy is where you start


• Options: short-sighted, or visionary
• Opt-out is short-sighted
• Opt-in is the visionary position
• Do not share is the ideal, but not a pragmatic business position for some
companies

• The Privacy Policy should be a value-add proposition for


customers and for companies
FRAMEWORK BUILDING BLOCKS -POLICIES

• An Enterprise Security or Privacy Policy


• Functional Security and Privacy Policies – A bit more realistic
• High level corporate policy
• Functional sub-policies
• Specialized and exception policies
• Multiple policies does not have to mean loss of standards
• Privacy officer to oversee and approve all policies
FRAMEWORK BUILDING BLOCKS -POLICIES

• Don’t reinvent the wheel


• There are many good example policies available
• Internal and external policies are different
• Some organizations may need to craft a customer privacy policy
statement for disclosure to consumers
• Remember to have a lawyer’s input and approval
WHO CLEARS ON THE
POLICY?

• Short Answer: Everyone


• Better Answer:
• CEO
• Business Units (Products and Operations)
• General Counsel
• Government Affairs
• Information Security
• I/T
DATABASES AND
DATA SECURITY
IT’S YOUR DATA – ARE YOU SURE IT’S
SAFE?
DATABASE OVERVIEW
• Every company needs places to store institutional knowledge and
data.

• Frequently that data contains proprietary information


• Personally Identifiable Data
• Employee HR Data
• Financial Data

• The security and confidentiality of this data is of critical


importance.
DATA ATTRIBUTES
• The attributes can be categorized into four types, which may overlap:
• Identifiers are attributes like passport or social security numbers that uniquely identify a
respondent;
• Quasi-identifiers or key attributes like address or age could identify a respondent with
some degree of ambiguity;
• Confidential or sensitive attributes include sensitive personal information like salary or
political affiliation; and
• Non-confidential or non-sensitive attributes contain other types of non-sensitive
information about the respondent.
SECURITY OVERVIEW
• There are four key issues in the security of databases just as with
all security systems
• Availability
• Authenticity
• Integrity
• Confidentiality
AVAILABILITY
• Data needs to be available at all necessary times
• Data needs to be available to only the appropriate users
• Need to be able to track who has access to and who has accessed
what data
AUTHENTICITY
• Need to ensure that the data has been edited by an authorized
source
• Need to confirm that users accessing the system are who they say
they are
• Need to verify that all report requests are from authorized users
• Need to verify that any outbound data is going to the expected
receiver
INTEGRITY
• Need to verify that any external data has the correct formatting
and other metadata
• Need to verify that all input data is accurate and verifiable
• Need to ensure that data is following the correct work flow rules
for your institution/corporation
• Need to be able to report on all data changes and who authored
them to ensure compliance with corporate rules and privacy laws.
CONFIDENTIALITY
• Need to ensure that confidential data is only available to correct
people
• Need to ensure that entire database is security from external and
internal system breaches
• Need to provide for reporting on who has accessed what data and
what they have done with it
• Mission critical and Legal sensitive data must be highly security
at the potential risk of lost business and litigation
KEEPING YOUR DATA CONFIDENTIAL

• Although the 4 pillars are of equal importance we are focusing on


Confidentiality due to the prevalence of data loss in financial and
personal areas
• We are going to review solutions for
• Internal data loss
• External hacking
• Securing data if hardware stolen
• Unapproved Administrator Access
MIDDLEWARE SECURITY CONCERNS

• Another set of security issues come from middleware that sits between the user and
the data.
• Middleware is software that lies between an operating system and the applications
running on it. Essentially functioning as hidden translation layer, middleware enables
communication and data management for distributed applications. It’s sometimes
called plumbing, as it connects two applications together so data and databases can be
easily passed between the “pipe.” Using middleware allows users to perform such
requests as submitting forms on a web browser, or allowing the web server to return
dynamic web pages based on a user’s profile.
• Single sign-on authentication
• Allows users just to have one password to access all systems but also means that the theft of
one password endangers all systems
PROS AND CONS OF 3RD PARTY
SOLUTIONS
Solution Description Pros Cons
Data Obfuscation (Masking, Fake or Scrambled data set for Can be very expensive – good
Scrambling) use by design and fake data can range in cost
implementation teams from $200,000 to $1 Million

Encryption of Data Allows personally identifiable Adds overhead and possible


data to be scrambled if performance issues.
intrusion takes place.

Database Intrusion/Extrusion Looks for SQL Injections, Bad Can eat into over head and
Prevention access commands and odd cause performance issues –
outbound data also expensive. Needs very
specific criteria to set up.

Data Leak Prevention Catches any data that is being Does not protect data in the
sent out of the system actual data warehouse.
BUILT IN DATABASE
PROTECTION

• Vendors such as Oracle, Microsoft and IBM know that security is


a big concern for data systems.
• They create built in solutions such as:
• Password Controls
• Data access based on roles and profiles
• IP restrictions for off site access
• Auditing capabilities of who has run what reports
• Security logging
PROS AND CONS OF BUILT IN
SOLUTIONS Pros
Solution Description Cons
Complex Passwords (require Makes passwords harder to guess Users write them down and keep
numbers and symbols) as well as and harder to crack them next to computer or forget
frequent password changes and need multiple resets

Keep Internal and External facing Makes it very hard to hack one Reduces functionality of
databases separate and then get through to the databases and restricts flow of
other internal data

Restrict Downloading Keeps data in the database and Restricts reporting capabilities
not loose in excel, etc and off line functionality

Restrict Unwanted Connections Again makes it harder to worm Makes integration more difficult
from one system to another and can reduce user acceptance

SAML (Security Assertion Markup SAML is the standard that is used If not in use blocks the usage of
Language) for Single Sign On functionality single sign on
RECOMMENDATIONS?
• Will we be able to keep the data secure while keeping the users
happy?
PRIVACY MODELS
• The computer science community has also contributed to sanitization for
disclosure control under the names Privacy Preserving Data Publishing
(PPDP) and Privacy Preserving Data Mining (PPDM).
• The former focuses on privacy-preserving publication of microdata, whereas
the latter focuses on bringing privacy protection to traditional data mining
tasks (for example, data classification or clustering).
• There is a substantial difference between the sanitization approaches by the
statistical and the computer science communities.
A PRIORI DISCLOSURE RISK CONTROL

In the computer science community, the primary focus in on disclosure risk. A


privacy model is used to select the tolerable disclosure risk level from the
outset (a priori control). Then a sanitization method is applied which
guarantees by design that the selected disclosure risk level is not exceeded. The
incurred information loss is measured after sanitization has been completed.
• We next review the two main privacy models used in the literature.
K-ANONYMITY
• A common approach to prevent disclosure via record linkage attacks is to hide each
individual record within a group. This is the approach that k-anonymity takes:
• (k-Anonymity) A data set is said to satisfy k-anonymity for an integer k>1 if, for each
combination of values of quasi-identifier attributes, at least k records exist in the data set
sharing that combination.
• To achieve k-anonymity, identifying attributes are removed and quasi-identifiers are masked
so that they become indistinguishable within each group of k records.
• Confidential attributes remain in clear form so that they preserve their analytical utility. In
this way, an intruder with access to an external non-anonymous data set that contains the
quasi-identifiers in the related data set will be unable to perform an exact re-identification.
E-DIFFERENTIAL PRIVACY
• Disclosure limitation via k-anonymity is based on guessing the information that is available to potential intruders, that
is, which attributes in the data set should be considered as quasi-identifiers. As long as this guessing is accurate, the
disclosure limitation method accomplishes its duty, but a privacy breach may happen if more information is available
to intruders.
• A different approach to anonymization is e-differential privacy. This approach was designed for sanitization in
queryable databases and it makes no
• assumptions on the intruder’s knowledge. The goal is to transform the answers to queries so that the effect of the
presence or absence of any single individual record on the returned answers is minimized.
• To achieve this goal, the influence of each individual on the query answer needs to be limited. More concretely, the
model imposes that the presence or absence of any single individual changes the query answer by at most a factor
depending on e.
• The smaller e, the more difficult it is for an intruder to use the query answer to infer the contribution of any specific
individual.

You might also like