Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 25

Building Blocks of Effective

Data Management
ITEC79
Three Pillars of Data Management
INTEGRATION

● Integration ingests and processes data to achieve a result;


● Processing must be scalable, repeatable, and agile.
● The longest delays in big data projects occur during integration;
● Smarter integration will reduce these time frames, automate processes, and
allow for rapid ingestion of new data.
INTEGRATION

■ Key Components of Integration include:


■ Agile and high performance ingestion of next generation data
■ Automated and scalable integration, cleansing, and mastering
of next-generation data
■ Optimized and readily usable tools for ingestion and processing
coupled with repeatable processes
GOVERNANCE

● Governance defines the processes to access and administer data, ensures


the quality of the data, how it is tagged and cataloged, and that it is fit-for-
purpose.
● Essentially, the business and IT teams must have confidence their data is
clean and valid.
GOVERNANCE

● Key components of governance include:


○ Collaborative governance to allow everyone to participate in holistic data stewardship
○ 360-degree view knowledge graph of data assets showing semantic, operational, and usage
relationships
○ Trust and confidence that the data is fit-for-purpose
○ Data quality, provenance, end-to-end lineage and traceability, and audit readiness
SECURITY
● Security identifies and manages sensitive data with a 360-degree ring of risk
assessment and analysis.
● Security must occur at the source, not just at the perimeter.
● Identifying which data is sensitive (credit card information, email addresses,
addresses, Social Security numbers, and other personally identifiable
information ) and which data aggregated together becomes sensitive is a
growing challenge.
SECURITY

● Key components of security are:


○ 360 degrees of sensitive data discovery, classification, and protection
○ Data proliferation and risk analysis
○ Masking and encryption for sensitive data
○ Security policy creation and management
Key Big Data Management Processes
● Access Data
● Integrate Data
● Cleanse Data
● Master Data
● Secure Data
● Explore and Analyze Data
● Explore and Analyze for Business Needs
● Operationalize the Insights
Access Data:

● Set up repeatable, well-managed processes to acquire data from both


traditional and next generation data sources.
● Multiple data sources will be used, so having pre-configured access tools and
connectors are a great timesaver.
Integrate Data:

● Establish processes to prepare and normalize data for a myriad of data


sources.
● This process is often very challenging; resist the temptation to rely on manual
methods, and leverage automation and repeatability as much as possible.
Cleanse Data:

● Review the data to ensure it’s ready for use;


○ that means checking for incomplete or inaccurate data and resolving any data errors that may
bias analysis or negatively impact business operations and decision making.
● Beware this process can be tedious, and leverage automation options when
available.
Master Data:

● Organize your data into logical domains that make sense to your business
such as customers, products, and services.
● Furthermore, you can add enrichment data to further paint a clearer picture of
your customers, products, and services and their relationships.
Secure Data:
● A mix of governance and security allows you to establish security rules and
then implement those rules.
● First, you must determine how you will manage your sensitive data.
● Next, you must find and assess the risk of your sensitive data and implement
rules via policy and technology.
● This process is very important but prone to be under-addressed by those
inexperienced in big data management.
Explore and Analyze Data:
● Implement a data laboratory to perform experiments with a clear business
goal in mind.
● Based on your hypotheses, find what data exists and how it can be analyzed
to create a model that delivers results.
● Then determine if the results are beneficial to the business; remember that
providing actionable information and processes is the goal.
● Develop best practices to enhance agility and processes before pushing the
solution into the factory.
Explore and Analyze for Business Needs:

● Test out data products to see if they provide a real value for the business;
often you just need to try something to see if it works.
● It is common to use A/B testing to determine if a new data product adds value
to the business.
● Make iterative improvements over time as you learn what works, what
doesn’t work, and what can be improved.
Operationalize the Insights:

● Automate and streamline your processes to create a steady pipeline of


actionable insights to business users.
● It’s not enough to have occasional production runs from the big data factory;
● The factory must be running regularly to be truly productive, meet business
service-level agreements (SLAs), and achieve the expected ROI.
Empowering the Big Data Team

● It is not a saying to say “a company’s greatest asset is its people”; it’s the
truth.
● The challenge is what can be done to increase their effectiveness and ability
to produce results, and in this context in working with big data.
FIRST
● Understand the role and needs of each team member or category of
member.
○ There will be a mix of data scientists, modelers, analysts, stewards, engineers, and business
users, all with different perspectives, skill levels, and needs.
○ Some will require greater self-service autonomy (in the laboratory environment), while others
require operational agility (in the factory environment);
● Your job is to identify their needs within the big data environment.
SECOND

● Incorporate the three pillars of big data management into the team members’
operating principles and environment.
● Using a disciplined approach, ensuring that in particular governance and
security processes are followed, is one of the biggest favors you can do for
your team.
● Not having governance policies enabling hassle-free access to data will
doom your team to needless headaches negotiating access to needed data.
SECOND (cont.)

● Failing to have necessary security controls in place also adds to data access
issues, but worse yet it opens up the risk the team could be associated with a
data breach.
● Make sure your team understands the value of governance and security and
uses it to their advantage.
THIRD
● Get help for your team in terms of training, effective technology, outside
experts, and vendor experience.
● Odds are your team is already overworked;
○ why make them do things the “hard way” by denying those tools and expertise to increase
their effectiveness?
● Forcing your team to work in isolation without of the great work already done
with big data will send the team down a path of one-off, custom solutions,
manual processes, and tedious work that is not reproducible.
THIRD (cont.)

● That DIY approach results in frustration for the team and costly lost
opportunities for the business.
FINALLY
● Consider what you can do with what you already have by creating
repeatable, automated processes and standardized technologies.
● Rather than re-inventing the wheel and expending resources for each new
project or dataset, seize opportunities where you can:
○ Reuse existing infrastructure and tools
○ Reuse skillsets, expertise, and processes
○ Reuse previous projects’ components
FINALLY (cont.)
● Working big data projects is initially complex work, but when quality big data
management principles are followed, that work can be reused again and
again to the benefit of the team and the business.
● Taking steps to empower your big data staff isn’t just right for them as
employees, but it yields benefits for the company as well.

You might also like