01 - IBM Watsonx - Data Exploring Watsonx - Data

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 31

Exploring

IBM watsonx.data
Hands-on Lab
Lab Guide by:
Kelly Schlamb
Principal, Learning Content Development | Data & AI
kschlamb@ca.ibm.com

Danny Arnold
Principal, Learning Content Development | Data & AI
darnold@us.ibm.com

Presenter:
Farah Auni Hisham
Technical Enablement Specialist | Data & AI
farah.hisham@ibm.com

1
Ecosystem Technical Enablement | Data & AI
watsonx.data
Hands-on Lab Agenda

Part 1 Accessing watsonx.data


1.1 Environment Setup
1.2 Infrastructure Components
1.3 Key User Interface

2
Ecosystem Technical Enablement | Data & AI
Part 1
Accessing watsonx.data

33
Ecosystem Technical Enablement | Data & AI
Public Service Announcement:

As this is a guided exercise, kindly follow the exact naming convention we


have here, to ensure the workshop is running smoothly.

Using different folder name, catalog names etc. might impact on the
subsequent steps.

To troubleshoot or amend the query will sometimes takes more time than
to re-start the hands-on lab steps.

Download the PDF from Box folder for seamless experience.

Ecosystem Technical Enablement | Data & AI 4


watsonx.data
Hands-on Lab Agenda

Part 1 Accessing watsonx.data


1.1 Environment Setup
1.2 Infrastructure Components
1.3 Key User Interface

5
Ecosystem Technical Enablement | Data & AI
1.1 Environment Setup

1. Go to TechZone reservation page https://techzone.ibm.com/my/reservations and click on your reserved tile.


Alternatively, go to https://techzone.ibm.com/my/workshops/student/661c02131de802001e001249.
Wait until the Status: Ready.

2. Refer to ‘Published Services’ section in to access SSH command, Presto console, MinIO console and
watsonx.data UI.

3. Command & URL*:


SSH command:
• ssh -p <5 digits> watsonx@<server>.techzone-services.com
watsonx.data UI:
• https://<server>.techzone-services.com: <5 digits>
Presto console:
• http://<server>.techzone-services.com: <5 digits>
MinIO console:
• http://<server>.techzone-services.com: <5 digits>

• No active reservation? Follow steps in • *Use your own commands and URLs from • Copy your list of Published services
Appendix 1 Lab Reservation. somewhere like Notepad in case you
‘Published services’ section in your instance page
get logged out from Techzone!

6
Ecosystem Technical Enablement | Data & AI
1.1 Environment Setup

4. Access watsonx.data UI:


watsonx.data UI*:
• https://<server>.techzone-services.com: <5 digits>

5. You might receive a warning about a potential security


risk. If so, click the Advanced button. Click the Accept
the Risk and Continue button.

6. Enter Username & Password:


Username: ibmlhadmin
Password: password

• *Use your own commands and URLs from


‘Published services’ section in your instance page

7
Ecosystem Technical Enablement | Data & AI
1.1 Environment Setup

7. Open command prompt and execute SSH command &


password:

SSH command*:
• ssh -p <5 digits> watsonx@<server>.techzone-services.com
Are you sure you want to continue connecting?
• Yes
Password [Password input will be invisible]
• watsonx.data

8. Switch to root user & change directory to watsonx.data


product binaries:

Switch to root user


• sudo su –
Change directory
• cd /root/ibm-lh-dev/bin

• *Use your own commands and URLs from


‘Published services’ section in your instance page

88
Ecosystem Technical Enablement | Data & AI
watsonx.data
Hands-on Lab Agenda

Part 1 Accessing watsonx.data


1.1 Environment Setup
1.2 Infrastructure Components
1.3 Key User Interface

9
Ecosystem Technical Enablement | Data & AI
1.2 Infrastructure Components

Infrastructure components of watsonx.data

Core watsonx.data functionality


Your existing
Data warehouse Data lake Ecosystem infrastructure
ecosystem

Query Engines
Query engines • Run workloads against data in watsonx.data .
• Supports multiple engines (this lab will use Presto as engine)

Catalogs
Governance
Metadata store • Manage table schemas and metadata for the data residing in
and metadata
Access control management watsonx.data

Data format

Storage
Storage • External buckets and databases can be registered and used in
watsonx.data

Infrastructure

10
Ecosystem Technical Enablement | Data & AI
1.2 Infrastructure Components

Clients can deploy watsonx.data in a number of different ways:

• As software-as-a-service (SaaS) on IBM Cloud and Amazon Web Services (AWS)

• As a cartridge with Cloud Pak for Data

• As standalone hybrid-cloud software that can be installed on Red Hat OpenShift (on-premises or in the cloud)

• As a simple, single-node Developer Edition installation

watsonx.data is currently available in a Standard Edition, with an Enterprise Edition planned for the future.

For the developer and partner community, IBM also offers an entry-level Developer Edition, which can be used to get
familiar with the watsonx.data console and environment. The Developer Edition has the same code base as the Standard
Edition, but some features are restricted, and it is not intended for production use.

This lab utilizes a pre-installed Developer Edition virtual machine (VM) environment that can be easily provisioned from
IBM Technology Zone (TechZone).

11
Ecosystem Technical Enablement | Data & AI
1.2 Infrastructure Components

Infrastructure components of watsonx.data (Developer Edition)

Core watsonx.data functionality


Your existing
Data warehouse Data lake Ecosystem infrastructure
ecosystem

Query Engines
Query engines • presto-01: A Presto query engine used to interact with data in the
data lakehouse.

Catalogs
Governance
Metadata store • iceberg_data: An Iceberg catalog, residing within watsonx.data’s
and metadata
Access control management embedded Hive Metastore (HMS).
• hive_data: A Hive catalog, also residing within the embedded HMS.
• wxd_system_data: This is a Hive catalog, associated with the wxd-
Data format system bucket.

Storage
Storage • iceberg-bucket: A bucket in the embedded MinIO object store.
The table data stored here is associated with the iceberg_data
catalog.
Infrastructure • hive-bucket: A bucket in the embedded MinIO object store. The
table data stored here is associated with the hive_data catalog.
• wxd-system: This is a bucket used to hold diagnostic data such
as query history and query event-related information for the
Presto engine. 12
Ecosystem Technical Enablement | Data & AI
watsonx.data
Hands-on Lab Agenda

Part 1 Accessing watsonx.data


1.1 Environment Setup
1.2 Infrastructure Components
1.3 Key User Interface

13
Ecosystem Technical Enablement | Data & AI
1.3 Key User
Interface

• Home Screen
• Infrastructure Manager
• Data Manager
• Query Workspace
• Query History
• Access Control

Ecosystem Technical Enablement | Data & AI 14


1.3 Key User
Interface

• Home Screen
• Infrastructure Manager
• Data Manager
• Query Workspace
• Query History
• Access Control

• Take note on how to easily register


external connections to watsonx.data
using the “Add component”

Ecosystem Technical Enablement | Data & AI 15


1.3 Key User
Interface

• Home Screen
• Infrastructure Manager
• Data Manager
• Query Workspace
• Query History
• Access Control

• Remember that you can view list of


tables in Navigation pane in “Data
Manager”

Ecosystem Technical Enablement | Data & AI 16


1.3 Key User 1. Create new schema: _new_schema1.

Interface

• Home Screen
• Infrastructure Manager
• Data Manager
• Query Workspace
2. Create table from file
• Query History
• Table name: new_cars from file: https://ibm.ent.box.com/v/data-cars-json
• Access Control

Ecosystem Technical Enablement | Data & AI 17


1.3 Key User
Interface

• Home Screen
• Infrastructure Manager
• Data Manager
• Query Workspace
• Query History
• Access Control

• You can generate path quickly for • When you are instructed to copy and paste a SQL statement into the
selected infrastructure items SQL worksheet, clear any statement(s) you previously ran before
running the new statement.

Ecosystem Technical Enablement | Data & AI 18


1.3 Key User
Interface

• Home Screen
• Infrastructure Manager
• Data Manager
• Query Workspace
• Query History
• Access Control

• This table is cleared when Presto is restarted, which


means that the query history is lost whenever the
engine shuts down. Alternative: backup the history
from system.runtime.queries.
Ecosystem Technical Enablement | Data & AI 19
1.3 Key User
Interface

• Home Screen
• Infrastructure Manager
• Data Manager
• Query Workspace
• Query History
• Access Control

Ecosystem Technical Enablement | Data & AI 20


1.3 Key User 1. Managing Infrastructure Access (add new users)

Interface Switch to super user :


• sudo su –
Using command:
• /root/ibm-lh-dev/bin/user-mgmt add-user user1 User password1

• Home Screen
• Infrastructure Manager
• Data Manager
• Query Workspace
• Query History
• Access Control

• This is not the usual practices when adding user, we just want
to set up a test data quickly. If you would prepare for a demo
with client, you can do this step earlier. For organization IdM, it
is possible for SSO as such other IBM enterprise products
however this is lab version hence will not cover it. 21
Ecosystem Technical Enablement | Data & AI
1.3 Key User 2. Managing Policies Access (add new policies)

Interface Policy name:


• “LimitSelectUser1”
Policy status after creation:
• Active
Add data:
• iceberg_data/_new_schema1/new_cars/*
Add rule:
• Home Screen
• Rule type: Allow
• Infrastructure Manager • Actions: Select
• Users/group: Users
• Data Manager
• Select user: user1
• Query Workspace
• Query History
• Access Control

Ecosystem Technical Enablement | Data & AI 22


IBM watsonx.data ®

an open, hybrid,
and governed fit-for-purpose data store
optimized to scale all data, analytics and
AI workloads

Ecosystem Technical Enablement | Data & AI 23


Appendix 1 Lab Reservation

First step before commencing watsonx hands-on exercise is reserving your lab environment.
For this workshop, there are 2 methods:

1) Reserving your own lab environment via IBM Technology Zone or commonly known as Techzone.
• Reserving your own lab via Techzone is always the most recommended option as this means that you may extend
your reservation up to 2 times.
• You are also able to access the environment at least 2 days after your reservation start time.
• You can proceed to reserve your own by Appendix Lab Reservation.
• Learning how to reserve your own environment in Techzone is essential for demo or even PoX.
• If you already have an active reservation of watsonx.data image in Techzone, you may skip this part and proceed to
1.1 Environment Setup.

2) Access pre-reserved lab environment (only available throughout workshop duration)


• This lab however only available for the duration of the workshop and the instance will be deleted afterwards
• If your own lab reservation failed or you would like to only access the lab during the workshop, you may proceed to
Appendix Access Pre-reserved Workshop.

25
Ecosystem Technical Enablement | Data & AI
watsonx.data Lab Reservation

1. You can reserve your own TechZone Lab for Self-Practice or


Client Demo purpose.
2. Go to watsonx.data developer base image in TechZone from 3

link below:
• https://techzone.ibm.com/collection/ibm-watsonxdata- 4

developer-base-image/environments
3. Go to Environment tab
4. Select IBM watsonx.data Development Lab (please do not
26
select POC version for the purpose of this lab!)
5. Select Reserve.
6. Reserve for now and fill in reservation form.
• Purpose: Practice / Self-Education
• Purpose description: Practice watsonx lab
• Preferred Geography: itz-watsonx – AMERICAS…
• VPN Access: Disable
7. Tick to agree with IBM Techzone T&C and policies and click
Submit.

Ecosystem Technical Enablement | Data & AI


Appendix 1 Lab Reservation

8. When your reservation is ready, you will receive email 9. Click on the reservation tile to see the published services.
notification. Click on the View My Reservation.

10. Published services will have the detail of your environment.

27
Ecosystem Technical Enablement | Data & AI
Appendix 2 Access Pre-reserved Workshop

1. Access IBM watsonx.data Workshop via this link:


• https://techzone.ibm.com/my/workshops/student/661c02131de802001e001249

2. Log in using your IBM ID


• IBM ID is a pre-requisite to access Techzone

3. Enter password below & click “Submit password/ access code”.


• Password: watsonx

28
Ecosystem Technical Enablement | Data & AI
Appendix 3 Restart container – Only to troubleshoot

1. Open command prompt and execute SSH command &


password:
SSH command*:
• ssh -p <5 digits> watsonx@<server>.techzone-services.com

2. Switch to root user & change directory to watsonx.data


product binaries:

Switch to root user


• sudo su –
Change directory
• cd /root/ibm-lh-dev/bin

3. Stop watsonx.data and wait until all components have been


stopped:
Stop container:
• ./stop

• *Use your own commands and URLs from


‘Published services’ section in your instance page

29
Ecosystem Technical Enablement | Data & AI
Appendix 3 Restart container – Only to troubleshoot

4. Start watsonx.data by running the following two commands:


Run mode:
• export LH_RUN_MODE=diag
Start container:
• ./start

5. It will take a few minutes for the various component


containers to start. Check the status of watsonx.data:
Check container status:
• ./status --all

30
Ecosystem Technical Enablement | Data & AI
Appendix 4 Removing the Db2 container’s password 90-day limit

If there is ‘password expired error’ in Db2 connection, it is due to Db2 container has a 90-day limit on the password.
Use the following method to remove the limit.

1. Open command prompt and execute SSH command & password:


SSH command*:
• ssh -p <5 digits> watsonx@<server>.techzone-services.com

2. Switch to root user & change directory to watsonx.data product binaries:


Switch to root user
• sudo su –
Change directory
• cd /root/ibm-lh-dev/bin

3. Change duration of Db2 server:


Run Docker command:
• docker exec db2server chage -I -1 -m 0 -M 99999 -E -1 db2inst1

31
Ecosystem Technical Enablement | Data & AI

You might also like