Professional Documents
Culture Documents
DSA Presentation
DSA Presentation
Contents 3 Approach
4 Operating Model
5
Additional Details
1 Summary | Business challenges
1 2 3 4 5
TELECOM BELL As IoT and 5G advance, TELECOM BELL is subject Power of data Increase pressure to show
must improve network customers easily switch to many regulations, there is a data-volume growth and profits
QOS to align with providers, prompting including data privacy and explosion, requiring both is constant and data and AI
consumers' changing TELECOM BELL to security regulations, and focus and new capabilities. will be a critical enabler
emphasis on mobile prioritize personalized needs effective ways to
connectivity and data engagement using customer adhere to these.
usage data for customized
messaging and services.
2 Summary| Technical Challenges
Today there are increased expectations and pressure on the Telecom organization to have a strong data & analytics strategy
Housekeeping - Maintenance of the in-house cluster is a difficult thru different portals and installations
1 Telecom Bell wants to improve the Quality of Service (QoS) of their network and to get there, start
migrating the core applications to cloud.
2 Databricks will bring industry leading expertise and Databricks platform expertise to drive the
transformation at speed.
3 Confluent will bring event streaming platform built on Kafka and the necessary platform support
4 Telecom Bell has a team of 10 Engineers with expertise on Kafka and spark
5
Desired timeline – May 2024
101 Summary
Contents 3 Approach
4 Operating Model
5
Additional Details
1 Platform & Architecture | Current Architecture
Limitations
• Data platform is not scalable for analytics,
AI/ML
• Upfront capacity planning and cost
• Governance of the data on HDFS is a challenge
• Data sits in silos and not easy to integrate/
connect
• Lack of discoverability of data (catalog)
• Housekeeping - Maintenance of the in-house
cluster is a difficult thru different portals and
installations
• Advance disaster recovery, durability and
availability
• Bigger IT infra staff required
2 Platform & Architecture | End state Architecture
Design target state architecture for a scalable, secure and well governed data platform
(AI /ML self-serve, advanced engineering capabilities including necessary governance on lake capability)
Fundamental Principles
• Scalability
• Performance
• Industrialized processes governing the pipeline
• Distributed, fault tolerant architecture
• Open file format for better interoperability between systems
• Security and reliability
• Data provenance and lineage
• ACID complaint
3 Platform & Architecture | Current vs New
12 More performant
Governance underand
theoptimized
same roofspark engine
New
4 Platform & Architecture | Artifacts
A World Class Data Platform!Key components of the data platform:
101 Summary
Contents 3 Approach
4 Operating Model
5
Additional Details
1 Approach | Our Tenets
E Zero down time F Log the journey at every G Principal of least H Agile
step to look back & learn access Methodology
privilege(PoLAP)
Contents 3 Approach
4 Operating Model
5
Additional Details
1 Operating Model | Joint Delivery Approach
Executive Leadership
Databricks Leadership: Telecom Bell Leadership Meeting Cadence
1 1
• Bi-Weekly Steering
Committee Meetings
Program Management • Weekly PMO Meetings
• Daily Delivery Team
Databricks Lead Telecom Bell Lead Meetings
1 1
A B C D
Application Team Platform Team Data Quality & Bringing it Together
Governance
Telecom Bell Resources Telecom Bell Resources Telecom Bell Resources Telecom Bell Resources
3 3 4 1
2 Operating Model | Pod Structure
Scrum Master
Shared Resource
Resident
Cloud DevOps Functional Solutions
Engineer Domain Expert Architect
Resident Solutions
Shared Resource
Leader Leader
Architect
Specialist Solutions
Resident
Architect (Security) Platform Customer Success Application Solutions
Leadership Engineer Team Architect
Delivery Solutions
Cloud DevOps Architect
Engineer Customer Success
Leader Data Engineer
Leader Azure Platform
Engineer
Shared Resource
Cloud Architect
Cloud DevOps
Engineer
Data Visualization
Engineer
Scrum Master
16 12
Databricks resource Telecom Bell resource
3 Operating Model | Road Map
DIAGNOSTIC OF THE CELEBRATION
Celebrate completion
1 CURRENT ENVIRONMENT
MIGRATION
3
2 END STATE ARCHITECTURE PLAYBOOK
A repeatable guideline to
migrate
PROGRAM applications to new architecture 5 MIGRATION:
KICKOFF 60%
ALONG THE WAY
4 MIGRATION: 10
HUMAN-CENTERED %
1 CHANGE ` Consistently –
Focus on each individual team 3 PLATFOR communicate,
remove
member’s technical skills and capacity M roadblocks &
for change. Reskill team members
eliminate
whose roles are changing Progress
friction
Celebrate
completion of
6 MIGRATI
quick wins to
strengthen
Progress ON morale
100%
2 MINDSET CHANGE
Adopt ‘Data as a Product’, self service
platform, federated governance, domain
specific ownership
Progress
DELIVERAB PROCESS
LES MEASURE PROGRESS
GOALS
3 Operating Model | Timeline
Define
Elements/Sources/Data
Deploy
Handover
Refactor the code
Application Incorporate changes
Test & Modify
Talk to business team Document &
KT
Assess Current State & Catalog Critical Data Prepare Governance Strategy
Data Quality + Elements (Identify roles, define interaction model)
Design Target State DQ Monitoring
Governance Assess Current State Data Governance Design & Deliver Governance Structure Implement Target State DQ Monitoring Handover
Assess skill and capability Establish ways of working – Agile : Update Roadmap and plan per evolving priorities
gaps within the organization Arrange handover of all areas
documentation, win celebrations
Bring it together Define Pods and teams
Create Upskilling Curriculum and Continuously monitor, foresee risk, mitigate risks , fetch leadership guidance
setup trainings sessions
Project management
101 Summary
Contents 3 Approach
4 Operating Model
5
Additional Details
1 Additional Details | Future Scope
Industrialization:
Competitive Differentiation
Reconciliation, Check pointing, Audit, Monitoring. Use of fault tolerant ingestion/migration tools like Azure Data Factory – Az
Data Loss Risk
Copy Activity
Data Corruption and Data Integrity Data Validation - Each record is compared in a bidirectional manner, and each record in the old system is compared against the
Risk target system and the target system against the old system
Interference Risks
Align with the stakeholders of each source on how the bandwidth can be shared. “Bring it together” team come into play to
(simultaneously use of source
address this
application)
Schema Evolution Delta file format – Schema evolution feature. Depends on schema on read. Further to make sure there are no incompatible
(Changing Dimensions) schemas coming in. A catalog and governance would be leveraged – Databricks Unity Catalog
Authorization Risk MFA and Identity Federation , access controls at row and column level by Delta Lake
Resource Availability & Making sure employees are fully advised about participation into workshops and/or interviews.
Competing Priorities
Get the right people at the right time
Strong support from the leadership Group, including areas who are not fully involved by the initial changes. One Team,
Senior Leadership Buy-In and Delays in
Decision Making One direction
Establish governance to provide clarity on accountabilities for decision making
Strong support from Senior Leadership if there is a need to put a hold on
Potential Impacts to Other existing projects
Projects Review current state of ongoing projects to see how it impacts to the Finance model
Prioritize major changes and focus on the big obstacles upfront
Agile and inspirational change management and communication structure
Lack of People Adoption –
Major Change Leverage Bring it together team, and roles like change management experts to steward people readiness and prepare
for change
Work with scalable and flexible design principles in mind to ensure proper
Design in Isolation integration and alignment with the business. It is a partnership approach
(Enterprise Integration) Gather key inputs to support cross function process design decisions
where applicable
Simplify data requests to collect data and information at the appropriate level of detail
Availability of Key Data Inputs
Assign designated Databricks and Telecom Bell contact to ensure smooth and timely transition of data
and Information
Discovery Phase to identify hidden environmental risks to foresee and mitigate
3 Additional Details | Assumptions
Area Assumption
Telecom Bell on premise platform is owned and managed by Telecom Bell and Databricks will get the necessary support to extent the setup to provision the
1 Platform
solution per the scope of this effort.
Telecom Bell is responsible for the design, integration and operation of all Client Identity and Access Management, Security Incident and Event Management,
2 Data Security
Vulnerability Scanning and Security Testing tooling and processes as appropriate.
Telecom Bell will provide system access to all source systems or applications required by scope. Telecom Bell will provide access to systems and
5 Access & Setup
environments(including DEV, SIT) within 5 business days of receipt of request.
Databricks persona will not have access to unencrypted PII data. Telecom Bell will be responsible for encrypting any PII data, prior to extraction in the Databricks
6 Access & Setup
platform.
7 Access & Setup PII and GDPR Data handling will be done by Telecom Bell as per the existing practices in delivery , any additional arrangement is out of scope.
Project
9 Management
Telecom Bell will provide relevant functional, technical and process documentation for data platforms and systems required by the scope.
Project
10 Management
Telecom Bell will nominate full time business and technical SMEs aligned to this project as per the agreed pod structure.
Project
11 Management
Telecom Bell data owners /nominees will make every attempt to attend the Scrum meetings and ceremonies to present their progress on the issues assigned
Project
12 Management
Telecom Bell will make sure we get required time and support from all the stakeholders for complete success of the project.
Databricks team will reuse and extend the existing data ingestion tooling and framework to support the ingestion activities into the platform. The project will
14 Data Build
carry a data discovery exercise where it will assess the local market data quality and readiness.
15 Data Build Source System inventory have already been identified and already in place.
16 License The Cloudera CDH on premise license is already expired in March 2022. However, the extended support is required and obtained.
4 Additional Details | Questions
Is there an onboarding guide for the consultants to get started on your environment ?
What are the roles and skills of existing 10 engineers on the team ?
Other than Cloudera, what all other paid subscriptions and packages are installed on the concerned architecture ?
Is there any major business contingency on this project plan? If so, what is the impact of the delayed delivery?
What are all the compliances and regulations that Telecom Bell need to follow about the concerned data?
Does Telecom Bell already have Azure account? If so, what is the level of enterprise support plan that is subscribed ?
Does Telecom Bell already have Confluent account? If so, what is the level of enterprise support plan that is subscribed ?
Design and drive clients' Data and AI journeys powered by cloud analytics • Fortune 5 American healthcare company
expertise! Offering data product mindset-driven solutions to deliver platforms Establish and manage DevOps, Data Engineering, and ML engineering teams in close collaboration with Data Scientists. Set
and beyond: Self-service framework, rapid experimentation lab, democratized
up a self-service Data and ML platform on Azure cloud for a Retail enterprise, incorporating an experimentation framework,
data, data products marketplace, multi-cloud solutions, data lake, data fabric,
data mesh patterns with federated governance, domain-specific ownership, and Model Training pipelines, and real-time inference using Azure AKS, Kubeflow, and Snowflake. Implement an Rx enterprise
more Data and ML platform on Azure cloud, enabling ETL pipelines with Databricks and Apache Airflow. Lead the development of
large-scale projects, including legacy modernization, Rx personalization, and Retail personalization programs that impact
millions of lives daily. Collaborate with technology partners, MSFT and NVIDIA, to present objectives, findings, and
RELEVANT FUNCTIONAL AND INDUSTRY EXPERIENCE
incorporate feedback for ML solutions with specialized NVIDIA GPUs. Architect and oversee the implementation of the
Industry Focus: Functional Expertise: Refrigerator IoT project on Azure, leveraging IOT hub, Azure Analytics, and Databricks. Lead the development of SAP HANA
to Spark integration. Manage the enhancement team in Data Engineering for pharmacy-related projects, ensuring critical
• HealthCare • Digital Transformation business deliveries. Design data-driven solutions, including self-service analytics platforms, rapid experimentation labs,
• Retail • Analytics and CDO Strategy
democratized data, multi-cloud solutions, data fabric, data mesh patterns with federated governance, and domain-specific
• Market Research • Open Source
• Finance • Machine Learning, IOT ownership. Develop an ingestion framework for seamless data migration across projects and cloud storage services.
• Data Drive Re-invention
• Multinational American information, data & market measurement company
CERTIFICATIONS Build a retail store data aggregation engine (Retail Intelligence system) for 24 countries, initially using Hadoop
MapReduce, later upgraded to Spark. Migrate on-premise batch processes to the cloud using Docker, Azure Batch
Amazon Web Services Certified Data Analytics - Specialty
Services, and Azure Shipyard for cost efficiency. Perform performance tuning on Apache Spark, cloud Hadoop
Amazon Web Services Solutions Architect - Associate
Cloudera Certified Developer for Apache Hadoop (CCDH) clusters (HDI), and Databricks on Azure and Hadoop platforms.
ACID Compliant
Time Travel
Event Streaming
Data as product
Exactly once
semantics Inter Operability
Data Migration
End of support
Lack of discoverability
Maintenance
1 Platform & Architecture | Artifacts
A World Class Data Platform!Key components of the data platform:
Lake House
MLOps
Governance
Databricks Marketplace
Databricks Notebooks
Share Work together Production
2 3
1 insights at scale
Share Notebooks and work with peers across Schedule Notebooks to automatically run
Quickly discover new insights with teams in multiple languages (R, Python, machine learning and data pipelines at
built-in interactive visualizations, or SQL and Scala) and libraries of your choice. scale. Create multistage pipelines using
leverage libraries such as Matplotlib Real-time coauthoring, commenting and Databricks Workflows. Set up alerts and
and ggplot. Export results and automated versioning simplify collaboration quickly access audit logs for easy
Notebooks in HTML or IPYNB while providing control. monitoring and troubleshooting.
format, or build and share dashboards
that always stay up to date.