Prescriptive Analytics with

DOcplex and pandas


• What is Prescriptive Analytics?
• Why Python for Prescriptive Analytics?
• DOcplex: What is it?
• Using DOcplex for modelling an Optimization problem
• Using pandas for improved modelling capabilities

What is Prescriptive Analysis?
 Also known as: How can we
make it happen?
Decision Optimization What will

 Prescriptive analytics is about: happened?

 recommending actions, Analytics

 based on desired outcomes,

 taking into account :

• specific scenarios,
• limited resources and
• knowledge of past and current events.
 This insight can help organizations make better decisions and have greater
control of business outcomes.
The Science of Better Decisions

How to best allocate

aircrafts and crews?

Optimization helps businesses:

Inventory cost vs.
What to build, • create the best possible plans customer satisfaction
where and when?
• explore alternatives and understand trade-off
• respond to changes in business operations

Risk vs. potential reward Cost vs.carbon

How does Optimization work?

What is an optimization model?
An optimization model is A Mathematical Programming
composed of: model:
• Decision variables
• Constraints
• An objective function

Solving a model means: A Constraints Programming

Finding an assignment to (CP) model:
decision variables that: • Based on higher level constructs:
• Discrete or interval variables
• minimize or maximize the
• Rich set of logical, arithmetic or
objective function,
(non-linear) functional constraints
• subject to meeting all over variables
constraints • Dedicated to combinatorial /
scheduling problems

Modelling languages for Prescriptive Analytics
 Modelling languages for Prescriptive Analytics: AMPL, GAMMS, OPL…
 Enable concise formulations close to mathematical language, intensive use
of matrices representation…

Input data definition

Decision variables: How much to

produce for each product
𝑃𝑟𝑜𝑑𝑢𝑐𝑡𝑖𝑜𝑛𝑝 ≥ 0

Objective: maximize profit

𝑃𝑟𝑜𝑓𝑖𝑡𝑝 × 𝑃𝑟𝑜𝑑𝑢𝑐𝑡𝑖𝑜𝑛𝑝
Constraints: demand for components
cannot exceed stock

∀𝑐, 𝐷𝑒𝑚𝑎𝑛𝑑𝑝,𝑐 × 𝑃𝑟𝑜𝑑𝑢𝑐𝑡𝑖𝑜𝑛𝑝 ≤ 𝑆𝑡𝑜𝑐𝑘𝑐

Why Python for Prescriptive Analytics?
 Take advantage of Python expressiveness (generators, aggregators,
operator overloading, tuples…).

 Python capabilities make it a viable alternative to specialized modelling

 1 single language to create the constraints AND do the workflow.
 Standard libraries with abstract constructs to manipulate: vectors,
matrices, relational data model…
 Ecosystem, ease of use, proven robustness, data ingest

 Workflow and mathematical description are part of the language, no

memory management

Why Python for Prescriptive Analytics?
 Core Python libraries for scientific people

 Notebooks = great technology for prototyping optimization

models in an interactive way

 Leverage Big Data tools, such as Apache Spark.

DOcplex: What is it? How to get it?

• Easily formulate your optimization models and solve them with IBM Decision Optimization on the
Cloud solve service or CPLEX local solver (with 0 code change).

• Access to free solve capabilities to discover this new API is made easy thanks to our cloud free trial
and our new CPLEX Optimization Studio free Community Edition (aka COS CE): you can get access
to any of those two with the help of one mail address.

• Available through the standard Python pip install with no need to download anything else or
contact any IBM person if you go full cloud.

• Just look for docplex in your browser to get access to docplex pypi repo or doc.
Comprehensive documentation and resources
 All documentation and resources are available on-line

 Educative: examples / cookbooks for all levels of expertise:

Discovering IBM Decision Optimization technologies…
…Reference manuals for APIs

 Social: community / forums

DOcplex for optimization modelling (MP)
Import DOcplex MP package import

Create the container for your model mdl = Model('Warehouse')

Define decision variables x = mdl.add_continuous_var('totDmd')

supply_vars =
(individually or as collections, mdl.binary_var_matrix(warehouses,
discrete or continuous) stores, 'supply')

Define constraints over variables mdl.add_constraint(supply_vars[w, s] <=

s] for s in stores) <= w.capacity)

mdl.minimize(total_opening_cost +
Define objective total_supply_cost)

Solve the model using local Cplex
mdl.solve(url=SVC_URL, key=SVC_KEY)
or on the cloud
DOcplex and Notebooks for Optimization

DOcplex and Notebooks for Optimization
Installing DOcplex and configuring your credentials

DOcplex and Notebooks for Optimization
Easy to download and parse json

DOcplex and Notebooks for Optimization
Visualizing the input data

DOcplex and Notebooks for Optimization

DOcplex and Notebooks for Optimization

DOcplex and Notebooks for Optimization

DOcplex and Notebooks for Optimization

Slicing and Aggregate constructs
 Two important constructs to describe complex problems in a compact form:
 Slicing filters: select a subset of items in a multi-dimensional collection
 Aggregate:
• used in combination with slicing,
• build the actual mathematical expression
forall ( l in leg_ids, we in weeks )
leg_teu[l][we] == sum (tv in trans_vars : tv.l.leg_id == l && w[] == we)
trans[tv] * size [tv.eqc];

for l in leg_ids:
for we in weeks:
leg_teu[(l, we)] == mdl.sum(trans[(tv.leg_id, tv.mot,, tv.eqc)] *
size[tv.eqc] for tv in trans_vars_list
if tv.leg_id == l and w[] == we))

Performance considerations
 Runtime model generation should be as effective as possible:
 may be invoked thousands of time when running in production
 large models may involve millions of variables and constraints
 “naïve” translation of slicing/aggregate in Python can be very inefficient when
nested loops are involved
 Use pandas for handling slicing on large collections
“pandas is an open source library providing high-performance, easy-to-
use data structures and data analysis tools for the Python programming
 DOcplex can benefit of the following pandas features:
 Data organized in multi-indexed tables
 Efficient merge operations between tables
 Efficient indexing, filtering and grouping operations on tables
Performance considerations
 Data Frame trans eqc leg_id mot date week
@trans_01 DRY-20 CDC-BOR Truck 10/06/16 23
@trans_02 HIGH-40 CHE-MAR Train 10/06/16 23
… … … … … …
 “naïve” slicing:
with SimpleTimer("TEU EQUATIONS-3", print_details=False):
for l in leg_ids:
for we in weeks:
leg_teu[(l, we)] == mdl.sum(t.trans * size[t.eqc] for t in trans_df_list
if t.leg_id == l and w[] == we))
--> Elapsed time: 5875 ms
 Slicing with pandas:
with SimpleTimer("TEU EQUATIONS-3", print_details=False):
trans_df['week'] = trans_df.apply(lambda row: w[], axis=1)
for l in leg_ids:
for we in weeks:
slice_df = trans_df.loc[(trans_df.leg_id == l) & (trans_df.week == we)]
leg_teu[(l, we)] == mdl.sum(t.trans * size[t.eqc]
for t in slice_df.itertuples()))
Performance considerations
 Issue with this formulation:
with SimpleTimer("TEU EQUATIONS-3", print_details=False):
trans_df['week'] = trans_df.apply(lambda row: w[], axis=1)
for l in leg_ids:
for we in weeks:
slice_df = trans_df.loc[(trans_df.leg_id == l) & (trans_df.week == we)]
leg_teu[(l, we)] == mdl.sum(t.trans * size[t.eqc]
for t in slice_df.itertuples()))

 Slicing is calculated inside the nested loops

 cost of creating a pandas Data Frame is incurred at each iteration

 Much better strategy:

 Prepare the results of all slicing filters before entering the nested loops
 This can be done thanks to pandas’ groupby and aggregate operations

Performance considerations
 Prepare all results of slicing beforehand:
with SimpleTimer("TEU EQUATIONS-3", print_details=False):
trans_df['week'] = trans_df.apply(lambda row: w[], axis=1)
trans_df['result'] = trans_df.apply(lambda row: row.trans * size[row.eqc], axis=1)

legWeeksMultiIndex = pd.MultiIndex.from_product([leg_ids, weeks], names=["leg_id", "week"])

legWeeksMultiIndex_df = pd.DataFrame(legWeeksMultiIndex.values.tolist(),
columns=["leg_id", "week"])
trans_full_df = legWeeksMultiIndex_df.merge(trans_df, how='left').fillna(0)

trans_sum_grpby = trans_full_df[['leg_id', 'week', 'result']].groupby(['leg_id', 'week']).\

aggregate(lambda x: mdl.sum(x.tolist()))

for l in leg_ids:
for we in weeks:
mdl.add_constraint(leg_teu[(l, we)] == trans_sum_grpby.result[l, we])
--> Elapsed time: 2323 ms

 Based on two pandas operations:

 groupby: split dataset into groups
 aggregate: perform a computation on the grouped data
29 © 2016 IBM Corporation
Performance considerations
 Re-writing using helper methods for generic patterns:
with SimpleTimer("TEU EQUATIONS-3", print_details=False):
trans_df['week'] = trans_df.apply(lambda row: w[], axis=1)
trans_df['result'] = trans_df.apply(lambda row: row.trans * size[row.eqc], axis=1)

trans_sum_grpby = for_cross_prod_sum_by([leg_ids, weeks], trans_df,

['leg_id', 'week'], 'result')

for l in leg_ids:
for we in weeks:
mdl.add_constraint(leg_teu[(l, we)] == trans_sum_grpby.result[l, we])

 To be compared with initial “naïve” slicing formulation:

with SimpleTimer("TEU EQUATIONS-3", print_details=False):
for l in leg_ids:
for we in weeks:
leg_teu[(l, we)] == mdl.sum(t.trans * size[t.eqc] for t in trans_df_list
if t.leg_id == l and w[] == we))

 Performance vs readability trade-off

 Python is one of the most relevant tools to easily turn an idea into working code
when dealing with data-wrangling problems, and then visualize their results.

 The exact same code that has been written and tested in a notebook for loading
data, modelling an optimization problem, solving it… can readily be integrated
and executed in a deployed Python environment.

 DOcplex objective: facilitate the diffusion and use of optimization technologies

 DOcplex + pandas: an alternative to specialized modelling languages

 On-going effort for defining “best practices” and patterns to:

 address performance issues

 facilitate formulation of models formulation that is readable and maintainable

Thank you!


