CSE545 sp20 (5) 3-3

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 81

Distributed TensorFlow

CSE545 - Spring 2020


Stony Brook University
Big Data Analytics, The Class
Goal: Generalizations
A model or summarization of the data.

Data Frameworks Algorithms and Analyses

Similarity Search
Hadoop File System Spark Hypothesis Testing
Streaming Graph Analysis
MapReduce Recommendation Systems
Tensorflow
Deep Learning
Limitations
Spark Overviewof Spark
Spark is fast for being so flexible
● Fast: RDDs in memory + Lazy evaluation: optimized chain of operations.
● Flexible: Many transformations -- can contain any custom code.
Limitations
Spark Overviewof Spark
Spark is fast for being so flexible
● Fast: RDDs in memory + Lazy evaluation: optimized chain of operations.
● Flexible: Many transformations -- can contain any custom code.
However:
● Hadoop MapReduce can still be better for extreme IO, data that will not fit in
memory across cluster.
Limitations
Spark Overviewof Spark
Spark is fast for being so flexible
● Fast: RDDs in memory + Lazy evaluation: optimized chain of operations.
● Flexible: Many transformations -- can contain any custom code.
However:
● Hadoop MapReduce can still be better for extreme IO, data that will not fit in
memory across cluster.

IO Bound Compute Bound


MapReduce Spark
large files (TBs or PBs) (1s of TBs, 100s of GBs) many numeric computations
Limitations
Spark Overviewof Spark
Spark is fast for being so flexible
● Fast: RDDs in memory + Lazy evaluation: optimized chain of operations.
● Flexible: Many transformations -- can contain any custom code.
However:
● Hadoop MapReduce can still be better for extreme IO, data that will not fit in
memory across cluster.
● Modern machine learning (esp. Deep learning), a common big data task,
requires heavy numeric computation.
IO Bound Compute Bound
MapReduce Spark
(large files: TBs or PBs) (1s of TBs, 100s of GBs) (many numeric computations)
* this is the subjective approximation of the instructor as of February 2020. A lot of factors at play.
Limitations
Spark Overviewof Spark
Spark is fast for being so flexible
● Fast: RDDs in memory + Lazy evaluation: optimized chain of operations.
● Flexible: Many transformations -- can contain any custom code.
However:
● Hadoop MapReduce can still be better for extreme IO, data that will not fit in
memory across cluster.
● Modern machine learning (esp. Deep learning), a common big data task,
requires heavy numeric computation.
IO Bound Compute Bound
MapReduce Spark TensorFlow
(large files: TBs or PBs) (1s of TBs, 100s of GBs) (many numeric computations)
* this is the subjective approximation of the instructor as of February 2020. A lot of factors at play.
Learning Objectives
Spark Overview

● Understand TensorFlow as a data workflow system.


○ Know the key components of TensorFlow.
○ Understand the key concepts of distributed TensorFlow.
● Execute basic distributed tensorflow program.
● Establish a foundation to distribute deep learning models:
● Convolutional Neural Networks
● Recurrent Neural Network (or LSTM, GRU)
What is TensorFlow?
Spark Overview

A workflow system catered to numerical computation.

One view: Like Spark, but uses tensors instead of RDDs.


What is TensorFlow?
Spark Overview

A workflow system catered to numerical computation.

One view: Like Spark, but uses tensors instead of RDDs.

A multi-dimensional matrix

(i.stack.imgur.com)
What is TensorFlow?
Spark Overview

A workflow system catered to numerical computation.

One view: Like Spark, but uses tensors instead of RDDs.

A 2-d tensor is just a matrix.


1-d: vector
0-d: a constant / scalar

Note: Linguistic ambiguity:


Dimensions of a Tensor =/=
Dimensions of a Matrix
(i.stack.imgur.com)
What is TensorFlow?
Spark Overview

A workflow system catered to numerical computation.

One view: Like Spark, but uses tensors instead of RDDs.

Examples > 2-d :


Image definitions in terms of RGB per pixel
Image[row][column][rgb]

Subject, Verb, Object representation of language:


Counts[verb][subject][object]
What is TensorFlow?
Spark Overview

A workflow system catered to numerical computation.

One view: Like Spark, but uses tensors instead of RDDs.

Technically, less abstract than RDDs which could hold tensors as


well as many other data structures (dictionaries/HashMaps,
Trees, ...etc…).

Then, why TensorFlow?


What is TensorFlow?
Spark Overview

Efficient, high-level built-in linear algebra and machine


learning optimization operations (i.e. transformations).

enables complex models, like deep learning


Technically, less abstract than RDDs which could hold tensors as
well as many other data structures (dictionaries/HashMaps,
Trees, ...etc…).

Then, why TensorFlow?


What is TensorFlow?
Spark Overview

Efficient, high-level built-in linear algebra and machine


learning optimization operations.

enables complex models, like deep learning

(Bakshi, 2016, “What is Deep Learning? Getting Started With Deep Learning”)
What is TensorFlow?
Spark Overview

Efficient, high-level built-in linear algebra and machine


learning operations.

(Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., ... & Ghemawat, S. (2016). Tensorflow:
Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467.)
TensorFlow
Spark Overview

Operations on tensors are often conceptualized


as graphs:

(Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., ... & Ghemawat, S. (2016). Tensorflow:
Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467.)
TensorFlow
Spark Overview

Operations on tensors are often conceptualized


as graphs:

A simpler example:
c
=mm(A, B)
c = tensorflow.matmul(a, b)

a b
TensorFlow
Spark Overview

Operations on tensors are often conceptualized


as graphs:

example:

d=b+c
e=c+2
a=d∗e

(Adventures in Machine Learning.


Python TensorFlow Tutorial, 2017)
Ingredients
Spark Overview of a TensorFlow
tensors*
operations
variables - persistent
an abstract computation
mutable tensors
(e.g. matrix multiply, add)
constants - constant
executed by device kernels
placeholders - from data
* technically, still operations
graph

session devices
defines the environment in the specific devices (cpus or
which operations run. gpus) on which to run the
(like a Spark context) session.
Ingredients
Spark Overview of a TensorFlow * technically, operations that work with tensors.

tensors*
○ operations
tf.Variable(initial_value, name)
variables - persistent
○ an abstract computation
tf.constant(value, type, name)
mutable tensors
○ (e.g. matrix multiply,
tf.placeholder(type, add)name)
shape,
constants - constant
executed by device kernels
placeholders - from data
* technically, still operations
graph

session devices
defines the environment in the specific devices (cpus or
which operations run. gpus) on which to run the
(like a Spark context) session.
Ingredients
Spark Overview
Operations
of a TensorFlow
tensors*
operations
variables - persistent
an abstract computation
mutable tensors
(e.g. matrix multiply, add)
constants - constant
executed by device kernels
placeholders - from data
Ingredients
Spark Overview of a TensorFlow
tensors*
● Places operations on devices operations
variables - persistent
an abstract computation
mutable tensors
● Stores the values of variables (when
(e.g.not distributed)
matrix multiply, add)
constants - constant
executed by device kernels
placeholders - from data
● Carries out execution: eval() or run()

graph

session devices
defines the environment in the specific devices (cpus or
which operations run. gpus) on which to run the
(like a Spark context) session.
Ingredients
Spark Overview of a TensorFlow * technically, operations that work with tensors.

tensors*
operations
variables - persistent
an abstract computation
mutable tensors
(e.g. matrix multiply, add)
constants - constant
executed by device kernels
placeholders - from data

graph

session devices
defines the environment in the specific devices (cpus or
which operations run. gpus) on which to run the
(like a Spark context) session.
Distributed
Spark OverviewTensorFlow

Typical use-case: (Supervised Machine Learning)


Determine weights, W, of a function, f , such that ε is minimized: f(X|W) = Y + ε
Distributed
Spark OverviewTensorFlow

Typical use-case:
Determine weights, W, of a function, f , such that ε is minimized: f(X|W) = Y + ε

X1 X2 X3 Y
Distributed
Spark OverviewTensorFlow

Typical use-case:
Determine weights, W, of a function, f , such that ε is minimized: f(X|W) = Y + ε

X1 X2 X3 Y

X1 X2 X3 X4 X5 X6
X7 X8 X9 X10 X11 X12 Y
X13 X14 X15 ... Xm
Distributed
Spark OverviewTensorFlow

Typical use-case:
Determine weights, W, of a function, f , such that ε is minimized: f(X|W) = Y + ε

X1 X2 X3 Y

X1 X2 X3 X4 X5 X6
X7 X8 X9 fXgiven
10
Xw111 , wX2....,
12 wp
Y
X13 X14 X15 ... Xm
(typically, p >= m)
Distributed
Spark OverviewTensorFlow

f(X|W) = Ŷ
Typical use-case: Y = (X|W) + ε
Determine weights, W, of a function, f , such that |ε| is minimized: f(X|W)
Y = Ŷ= +Y ε+ ε
ε=Ŷ -Y
X1 X2 X3 Y

X1 X2 X3 X4 X5 X6
X7 X8 X9 fXgiven
10
Xw111 , wX2....,
12 wp
Y
X13 X14 X15 ... Xm
(typically, p >= m)
Distributed
Spark OverviewTensorFlow

f(X|W)= =Ŷ Ŷ
f(X|W)
Typical use-case: Ŷ - Y+ ε
Yε==(X|W)
Determine weights, W, of a function, f , such that ε is minimized: f(X|W)
Y ==ŶY ++ ε
ε=Ŷ -Y
X1 X2 X3 Y
Typically
Typically,
very
very
complex!
complex!

X1 X2 X3 X4 X5 X6
X7 X8 X9 fXgiven
10
Xw111 , wX2....,
12 wp
Y
X13 X14 X15 ... Xm
(typically, p >= m)
Distributed
Spark OverviewTensorFlow

f(X|W)= =Ŷ Ŷ
f(X|W)
Typical use-case: Ŷ - Y+ ε
Yε==(X|W)
Determine weights, W, of a function, f , such that ε is minimized: f(X|W)
Y ==ŶY ++ ε
ε=Ŷ -Y
W determined through gradient descent:
X1 propagating
back
X2 X3 error Y across the network that defines f.

X1(1) X2 X3 X4 X5 X6
X7 X8 X9 fXgiven
10
Xw111 , wX2....,
12 wp
Y(1)
X13 X14 X15 ... Xm
(typically, p >= m)
Distributed
Spark OverviewTensorFlow

f(X|W)= =Ŷ Ŷ
f(X|W)
Typical use-case: Ŷ - Y+ ε
Yε==(X|W)
Determine weights, W, of a function, f , such that ε is minimized: f(X|W)
Y ==ŶY ++ ε
ε=Ŷ -Y
W determined through gradient descent:
X1 propagating
back
X2 X3 error Y across the network that defines f.

XX1(1)(2)(3) XX2 XX3 XX4 XX5 XX6


X
1 (4) X
XX7X1 (...)
1
2 X3
XX8 X2 XX9 Xf3Xgiven
2 3
X
4
X
4
X
5 X6
XwX111 X, w5X2....,
5 X
6w YY(1)(2)(3)
7X1 (N) X 8 X2 X 9 X3 X
X1010 X4 X
4 11 X5 X
X
1212 6Xp
X
XX13 XX7 1X 8 X
X X X9 X3 X 10...X XX
11 X5 X12 X66 YY(4)
X 2X ...
4 X
X13X13 X7X714X14X14X8X815X15(typically,
X
X9X9 X10Xp10>=
15 9
... m)XX11
...
mm X
11 X 12
m X12
X
...Y(N)
7 8
X13X13 X14X14 X15X15 10 ...... XmXm 12
11
13 14 15 m
minimizes ε on N training examples
Weights Derived from Gradients
Spark Overview

TensorFlow has built-in ability to derive gradients given a cost function.

(rasbt,
Weights Derived from Gradients
Spark Overview

Linear Regression: Trying to find “betas” that minimize:


Weights Derived from Gradients
Spark Overview

Linear Regression: Trying to find “betas” that minimize:

matrix multiply

Thus:
Weights Derived from Gradients
Spark Overview

Linear Regression: Trying to find “betas” that minimize:

matrix multiply

Thus:
In standard linear equation:

(if we add a column of 1s, mx + b is just matmul(m, x))


Weights Derived from Gradients
Spark Overview

Linear Regression: Trying to find “betas” that minimize:

matrix multiply

Thus:
Weights Derived from Gradients
Spark Overview

Linear Regression: Trying to find “betas” that minimize:

Thus:

How to update?
Weights Derived from Gradients
Spark Overview

Linear Regression: Trying to find “betas” that minimize:

Thus:

How to update?

(for gradient descent) “learning rate”


Weights Derived from Gradients
Spark Overview

Ridge Regression (L2 Penalized linear regression, )

1. Matrix Solution:
Weights Derived from Gradients
Spark Overview

Ridge Regression (L2 Penalized linear regression, )

2. Gradient descent solution


(Mirrors many parameter optimization problems.)

1. Matrix Solution:
Weights Derived from Gradients
Spark Overview

Ridge Regression (L2 Penalized linear regression, )

Gradient descent needs to solve.


(Mirrors many parameter optimization problems.)

TensorFlow has built-in ability to derive gradients given a cost function.


Weights Derived from Gradients
Spark Overview

Ridge Regression (L2 Penalized linear regression, )

Gradient descent needs to solve.


(Mirrors many parameter optimization problems.)

TensorFlow has built-in ability to derive gradients given a cost function.


Weights Derived from Gradients
Spark Overview

TensorFlow has built-in ability to derive gradients given a cost function.


Options
Options for for distribution
Distributing
Spark Overview ML
1. Distribute copies of entire dataset
a. Train over all with different hyperparameters
b. Train different folds per worker node.

Pro: Flexible to all situations; Con: Optimizing for subset is suboptimal

Pro: Parameters can be localized Con: High communication for transferring


Intermediar data.
Options
Options for for distribution
Distributing
Spark Overview ML
1. Distribute copies of entire dataset
a. Train over all with different parameters
b. Train different folds per worker node.

Pro: Easy; Good for compute-bound; Con: Requires data fit in worker memories

2. Distribute data
a. Each node finds parameters for subset of data
b. Needs mechanism for updating parameters
i. Centralized parameter server
ii. Distributed All-Reduce

Pro: Flexible to all situations; Con: Optimizing for subset is suboptimal

Pro: Parameters can be localized Con: High communication for transferring


Intermediar data.
Options
Options for for distribution
Distributing
Spark Overview ML
1. Distribute copies of entire dataset
a. Train over all with different parameters
b. Train different folds per worker node.

Pro: Easy; Good for compute-bound; Con: Requires data fit in worker memories

2. Distribute data
a. Each node finds parameters for subset of data
b. Needs mechanism for updating parameters
i. Centralized parameter server
ii. Distributed All-Reduce

Pro: Flexible to all situations; Con: Optimizing for subset is suboptimal

3. Distribute model or individual operations (e.g. matrix multiply)

Pro: Parameters can be localized Con: High communication for transferring


Intermediar data.
Options
Options for for distribution
Distributing
Spark Overview ML
1. Distribute copies of entire dataset
a. Train over all with different parameters
b. Train different folds per worker node.

Pro: Easy; Good for compute-bound; Con: Requires data fit in worker memories

2. Distribute data
a. Each node finds parameters for subset of data
b. Needs mechanism for updating parameters
i. Centralized parameter server
ii. Distributed All-Reduce

Pro: Flexible to all situations; Con: Optimizing for subset is suboptimal

3. Distribute model or individual operations (e.g. matrix multiply)

Pro: Parameters can be localized Con: High communication for transferring


Intermediar data.
Options
Options for for distribution
Distributing
Spark Overview ML Done often in practice. Not
talked about much because it’s
1. Distribute copies of entire dataset mostly as easy as it sounds.
a. Train over all with different parameters
b. Train different folds per worker node.

Pro: Easy; Good for compute-bound; Con: Requires data fit in worker memories

2. Distribute data
a. Each node finds parameters for subset of data
b. Needs mechanism for updating parameters
i. Centralized parameter server
ii. Distributed All-Reduce

Pro: Flexible to all situations; Con: Optimizing for subset is suboptimal

3. Distribute model or individual operations (e.g. matrix multiply)

Pro: Parameters can be localized Con: High communication for transferring


Intermediar data.
Options
Options for for distribution
Distributing
Spark Overview ML Done often in practice. Not
talked about much because it’s
1. Distribute copies of entire dataset mostly as easy as it sounds.
a. Train over all with different parameters
b. Train different folds per worker node.

Pro: Easy; Good for compute-bound; Con: Requires data fit in worker memories

2. Distribute data Preferred method for big data or


a. Each node finds parameters for subset of data
very complex models (i.e.
b. Needs mechanism for updating parameters
i. Centralized parameter server
models with many internal
ii. Distributed All-Reduce parameters).

Pro: Flexible to all situations; Con: Optimizing for subset is suboptimal

3. Distribute model or individual operations (e.g. matrix multiply)

Pro: Parameters can be localized Con: High communication for transferring


Intermediar data.
Options
Options for for distribution
Distributing
Spark Overview ML Done often in practice. Not
talked about much because it’s
1. Distribute copies of entire dataset mostly as easy as it sounds.
a. Train over all with different parameters
b. Train different folds per worker node.

Pro: Easy; Good for compute-bound; Con: Requires data fit in worker memories

2. Distribute data Preferred method for big data or


a. Each node finds parameters for subset of data
very complex models (i.e.
b. Needs mechanism for updating parameters
Data Parellelism
i. Centralized parameter server
models with many internal
ii. Distributed All-Reduce parameters).

Pro: Flexible to all situations; Con: Optimizing for subset is suboptimal

3. Distribute model or individual operations (e.g. matrix multiply)

Model Parellelism
Pro: Parameters can be localized Con: High communication for transferring
Intermediar data.
Model Parallelism

Multiple devices on multiple machines

Transfer Tensors
Machine A Machine B
CPU:0 CPU:1 GPU:0
Data Parallelism

CPU:0 CPU:1 GPU:0


Data Parallelism

worker:0 worker:1 worker:2


Distributing Data
X y
0

N
Distributing Data
X y
0

batch_size-1

N-batch_size

N
learn parameters (i.e. weights),
Distributing Data given graph with cost function
and optimizer

X y
0
𝛳batch0
batch_size-1

𝛳batch1

𝛳batch2

𝛳...

N-batch_size

N
Distributing Data
X y
0
𝛳batch0
batch_size-1

𝛳batch1

Combine
parameters

N-batch_size

N
Distributing Data update params of each node and repeat

X y
0
𝛳batch0
batch_size-1

𝛳batch1

Combine
parameters

N-batch_size

N
Gradient Descent for Linear Regression

(Geron, 2017)
Gradient Descent for Linear Regression
Batch Gradient Descent

Stochastic Gradient Descent: One example at a time

Mini-batch Gradient Descent: k examples at a time.

(Geron, 2017)
Gradient Descent for Linear Regression
Batch Gradient Descent

Stochastic Gradient Descent: One example at a time

Mini-batch Gradient Descent: k examples at a time.

(Geron, 2017)
Distributed
Spark OverviewTensorFlow

Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., ... & Kudlur, M. (2016, November). TensorFlow:
A System for Large-Scale Machine Learning. In OSDI (Vol. 16, pp. 265-283).
Distributed TensorFlow
Distributed:
● Locally: Across processors (cpus, gpus, tpus)
● Across a Cluster: Multiple machine with multiple processors
Distributed TensorFlow
Distributed:
● Locally: Across processors (cpus, gpus, tpus)
● Across a Cluster: Multiple machine with multiple processors

Parallelisms:
● Data Parallelism: All nodes doing same thing on different subsets of data
● Graph/Model Parallelism: Different portions of model on different devices
Distributed TensorFlow
Distributed:
● Locally: Across processors (cpus, gpus, tpus)
● Across a Cluster: Multiple machine with multiple processors

Parallelisms:
● Data Parallelism: All nodes doing same thing on different subsets of data
● Graph/Model Parallelism: Different portions of model on different devices

Model Updates:
● Asynchronous Parameter Server
● Synchronous AllReduce (doesn’t work with Model Parallelism)
Distributed TensorFlow
Distributed:
● Locally: Across processors (cpus, gpus, tpus)
● Across a Cluster: Multiple machine with multiple processors
discussed
Parallelisms: previously

● Data Parallelism: All nodes doing same thing on different subsets of data
● Graph/Model Parallelism: Different portions of model on different devices

Model Updates:
● Asynchronous Parameter Server
● Synchronous AllReduce (doesn’t work with Model Parallelism)
Local Distribution

Multiple devices on single machine

Program 1 Program 2

CPU:0 CPU:1 GPU:0


Local Distribution

Multiple devices on single machine

CPU:0 CPU:1 GPU:0


Local Distribution

Multiple devices on multiple machines

Machine A Machine B
CPU:0 CPU:1 GPU:0
Distributed TensorFlow
Distributed:
● Locally: Across processors (cpus, gpus, tpus)
● Across a Cluster: Multiple machine with multiple processors

Parallelisms:
● Data Parallelism: All nodes doing same thing on different subsets of data
● Graph/Model Parallelism: Different portions of model on different devices

Model Updates:
● Asynchronous Parameter Server
● Synchronous AllReduce (doesn’t work with Model Parallelism)
Distributed TensorFlow
Distributed:
● Locally: Across processors (cpus, gpus, tpus)
● Across a Cluster: Multiple machine with multiple processors

Parallelisms:
● Data Parallelism: All nodes doing same thing on different subsets of data
● Graph/Model Parallelism: Different portions of model on different devices

Model Updates:
● Asynchronous Parameter Server
● Synchronous AllReduce (doesn’t work with Model Parallelism)
Parallelisms
Model Parallelism

Multiple devices on multiple machines

Transfer Tensors
Machine A Machine B
CPU:0 CPU:1 GPU:0
Parallelisms Data Parallelism

CPU:0 CPU:1 GPU:0


Distributed TensorFlow
Distributed:
● Locally: Across processors (cpus, gpus, tpus)
● Across a Cluster: Multiple machine with multiple processors

Parallelisms:
● Data Parallelism: All nodes doing same thing on different subsets of data
● Graph/Model Parallelism: Different portions of model on different devices

Model Updates:
● Asynchronous Parameter Server
● Synchronous AllReduce (doesn’t work with Model Parallelism)
Distributed TensorFlow
Distributed:
● Locally: Across processors (cpus, gpus, tpus)
● Across a Cluster: Multiple machine with multiple processors

Parallelisms:
● Data Parallelism: All nodes doing same thing on different subsets of data
● Graph/Model Parallelism: Different portions of model on different devices

Model Updates:
● Asynchronous Parameter Server
● Synchronous AllReduce (doesn’t work with Model Parallelism)
Asynchronous Parameter Server
“ps” “worker”
task 0 task 0 task 1

TF Server TF Server TF Server


Master Master Master

Worker Worker Worker

CPU:0 CPU:1 CPU:0 GPU:0

(Geron, 2017: HOML: p.324)


Machine A Machine B
Asynchronous Parameter Server
“ps” “worker”
task 0 task 0 task 1
Parameter Server: Job is just to maintain
TF Server TF of
values Server TF Server
variables being optimized.
Master Master Master
Workers: do all the numerical “work” and
Worker send updates
Worker to the parameter server.
Worker

CPU:0 CPU:1 CPU:0 GPU:0

(Geron, 2017: HOML: p.324)


Machine A Machine B
Synchronous All Reduce
“Worker” Worker Worker Worker

Workers do computation, send parameter


TF Server TF Server
updates to other TF Serverand store parameter
workers,
Master updates from other Master
Master workers. Requires low
latency communication.
Worker Worker Worker

CPU:0 CPU:1 CPU:0 GPU:0

(Geron, 2017: HOML: p.324)


Machine A Machine B
Distributed TF: Full Pipeline

Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., ... & Kudlur, M. (2016, November). TensorFlow:
A System for Large-Scale Machine Learning. In OSDI (Vol. 16, pp. 265-283).
Summary
Spark Overview

● TF is a workflow system, where records are always tensors


○ operations applied to tensors (as either Variables,
constants, or placeholder)
● Optimized for numerical / linear algebra
○ automatically finds gradients
○ custom kernels for given devices
● “Easily” distributes
○ Within a single machine (local: many devices))
○ Across a cluster (many machines and devices)
○ Jobs broken up as parameter servers / workers makes
coordination of data efficient

You might also like