Download as pdf or txt
Download as pdf or txt
You are on page 1of 34

Prescriptive Analytics with

DOcplex and pandas


Hugues JUILLE

© 2016 IBM Corporation


Agenda
• What is Prescriptive Analytics?
• Why Python for Prescriptive Analytics?
• DOcplex: What is it?
• Using DOcplex for modelling an Optimization problem
• Using pandas for improved modelling capabilities

2 © 2016 IBM Corporation


What is Prescriptive Analysis?
 Also known as: How can we
make it happen?
Prescriptive
Decision Optimization What will
happen?
Analytics

Value
Predictive
What
 Prescriptive analytics is about: happened?
Analytics

Descriptive
 recommending actions, Analytics

 based on desired outcomes,


 taking into account :
Difficulty

• specific scenarios,
• limited resources and
• knowledge of past and current events.
 This insight can help organizations make better decisions and have greater
control of business outcomes.
3 © 2016 IBM Corporation
The Science of Better Decisions

How to best allocate


aircrafts and crews?

Optimization helps businesses:


Inventory cost vs.
What to build, • create the best possible plans customer satisfaction
where and when?
• explore alternatives and understand trade-off
• respond to changes in business operations

Risk vs. potential reward Cost vs.carbon


emission
4 © 2016 IBM Corporation
How does Optimization work?

5 © 2016 IBM Corporation


What is an optimization model?
An optimization model is A Mathematical Programming
composed of: model:
• Decision variables
• Constraints
• An objective function

Solving a model means: A Constraints Programming


Finding an assignment to (CP) model:
decision variables that: • Based on higher level constructs:
• Discrete or interval variables
• minimize or maximize the
• Rich set of logical, arithmetic or
objective function,
(non-linear) functional constraints
• subject to meeting all over variables
constraints • Dedicated to combinatorial /
scheduling problems

6 © 2016 IBM Corporation


Agenda
• What is Prescriptive Analytics?
• Why Python for Prescriptive Analytics?
• DOcplex: What is it?
• Using DOcplex for modelling an Optimization problem
• Using pandas for improved modelling capabilities

7 © 2016 IBM Corporation


Modelling languages for Prescriptive Analytics
 Modelling languages for Prescriptive Analytics: AMPL, GAMMS, OPL…
 Enable concise formulations close to mathematical language, intensive use
of matrices representation…

Input data definition

Decision variables: How much to


produce for each product
𝑃𝑟𝑜𝑑𝑢𝑐𝑡𝑖𝑜𝑛𝑝 ≥ 0

Objective: maximize profit

𝑃𝑟𝑜𝑓𝑖𝑡𝑝 × 𝑃𝑟𝑜𝑑𝑢𝑐𝑡𝑖𝑜𝑛𝑝
𝑝
Constraints: demand for components
cannot exceed stock

∀𝑐, 𝐷𝑒𝑚𝑎𝑛𝑑𝑝,𝑐 × 𝑃𝑟𝑜𝑑𝑢𝑐𝑡𝑖𝑜𝑛𝑝 ≤ 𝑆𝑡𝑜𝑐𝑘𝑐


𝑝
8 © 2016 IBM Corporation
Why Python for Prescriptive Analytics?
 Take advantage of Python expressiveness (generators, aggregators,
operator overloading, tuples…).

 Python capabilities make it a viable alternative to specialized modelling


languages:
 1 single language to create the constraints AND do the workflow.
 Standard libraries with abstract constructs to manipulate: vectors,
matrices, relational data model…
 Ecosystem, ease of use, proven robustness, data ingest

 Workflow and mathematical description are part of the language, no


memory management

9 © 2016 IBM Corporation


Why Python for Prescriptive Analytics?
 Core Python libraries for scientific people

 Notebooks = great technology for prototyping optimization


models in an interactive way

 Leverage Big Data tools, such as Apache Spark.

10 © 2016 IBM Corporation


Agenda
• What is Prescriptive Analytics?
• Why Python for Prescriptive Analytics?
• DOcplex: What is it?
• Using DOcplex for modelling an Optimization problem
• Using pandas for improved modelling capabilities

11 © 2016 IBM Corporation


DOcplex: What is it? How to get it?

• Easily formulate your optimization models and solve them with IBM Decision Optimization on the
Cloud solve service or CPLEX local solver (with 0 code change).

• Access to free solve capabilities to discover this new API is made easy thanks to our cloud free trial
and our new CPLEX Optimization Studio free Community Edition (aka COS CE): you can get access
to any of those two with the help of one mail address.

• Available through the standard Python pip install with no need to download anything else or
contact any IBM person if you go full cloud.

• Just look for docplex in your browser to get access to docplex pypi repo or doc.
12 © 2016 IBM Corporation
Comprehensive documentation and resources
 All documentation and resources are available on-line

 Educative: examples / cookbooks for all levels of expertise:


Discovering IBM Decision Optimization technologies…
…Reference manuals for APIs

 Social: community / forums

13 © 2016 IBM Corporation


DOcplex for optimization modelling (MP)
Import DOcplex MP package import docplex.mp

Create the container for your model mdl = Model('Warehouse')

Define decision variables x = mdl.add_continuous_var('totDmd')


supply_vars =
(individually or as collections, mdl.binary_var_matrix(warehouses,
discrete or continuous) stores, 'supply')

Define constraints over variables mdl.add_constraint(supply_vars[w, s] <=


open_vars[w])
mdl.add_constraint(mdl.sum(supply_vars[w,
s] for s in stores) <= w.capacity)

mdl.minimize(total_opening_cost +
Define objective total_supply_cost)

mdl.solve()
Solve the model using local Cplex
mdl.solve(url=SVC_URL, key=SVC_KEY)
or on the cloud
14 © 2016 IBM Corporation
Agenda
• What is Prescriptive Analytics?
• Why Python for Prescriptive Analytics?
• DOcplex: What is it?
• Using DOcplex for modelling an Optimization problem
• Using pandas for improved modelling capabilities

15 © 2016 IBM Corporation


DOcplex and Notebooks for Optimization

16 © 2016 IBM Corporation


DOcplex and Notebooks for Optimization
Installing DOcplex and configuring your credentials

17 © 2016 IBM Corporation


DOcplex and Notebooks for Optimization
Easy to download and parse json

18 © 2016 IBM Corporation


DOcplex and Notebooks for Optimization
Visualizing the input data

19 © 2016 IBM Corporation


DOcplex and Notebooks for Optimization

20 © 2016 IBM Corporation


DOcplex and Notebooks for Optimization

21 © 2016 IBM Corporation


DOcplex and Notebooks for Optimization

22 © 2016 IBM Corporation


DOcplex and Notebooks for Optimization

23 © 2016 IBM Corporation


Agenda
• What is Prescriptive Analytics?
• Why Python for Prescriptive Analytics?
• DOcplex: What is it?
• Using DOcplex for modelling an Optimization problem
• Using pandas for improved modelling capabilities

24 © 2016 IBM Corporation


Slicing and Aggregate constructs
 Two important constructs to describe complex problems in a compact form:
 Slicing filters: select a subset of items in a multi-dimensional collection
 Aggregate:
• used in combination with slicing,
• build the actual mathematical expression
OPL:
forall ( l in leg_ids, we in weeks )
leg_teu[l][we] == sum (tv in trans_vars : tv.l.leg_id == l && w[tv.date] == we)
trans[tv] * size [tv.eqc];

DOcplex:
for l in leg_ids:
for we in weeks:
mdl.add_constraint(
leg_teu[(l, we)] == mdl.sum(trans[(tv.leg_id, tv.mot, tv.date, tv.eqc)] *
size[tv.eqc] for tv in trans_vars_list
if tv.leg_id == l and w[tv.date] == we))

25 © 2016 IBM Corporation


Performance considerations
 Runtime model generation should be as effective as possible:
 may be invoked thousands of time when running in production
 large models may involve millions of variables and constraints
 “naïve” translation of slicing/aggregate in Python can be very inefficient when
nested loops are involved
 Use pandas for handling slicing on large collections
“pandas is an open source library providing high-performance, easy-to-
use data structures and data analysis tools for the Python programming
language”
 DOcplex can benefit of the following pandas features:
 Data organized in multi-indexed tables
 Efficient merge operations between tables
 Efficient indexing, filtering and grouping operations on tables
26 © 2016 IBM Corporation
Performance considerations
 Data Frame trans eqc leg_id mot date week
@trans_01 DRY-20 CDC-BOR Truck 10/06/16 23
trans_df:
@trans_02 HIGH-40 CHE-MAR Train 10/06/16 23
… … … … … …
 “naïve” slicing:
with SimpleTimer("TEU EQUATIONS-3", print_details=False):
for l in leg_ids:
for we in weeks:
mdl.add_constraint(
leg_teu[(l, we)] == mdl.sum(t.trans * size[t.eqc] for t in trans_df_list
if t.leg_id == l and w[t.date] == we))
--> Elapsed time: 5875 ms
 Slicing with pandas:
with SimpleTimer("TEU EQUATIONS-3", print_details=False):
trans_df['week'] = trans_df.apply(lambda row: w[row.date], axis=1)
for l in leg_ids:
for we in weeks:
slice_df = trans_df.loc[(trans_df.leg_id == l) & (trans_df.week == we)]
mdl.add_constraint(
leg_teu[(l, we)] == mdl.sum(t.trans * size[t.eqc]
for t in slice_df.itertuples()))
27 --> Elapsed time: 4681 ms © 2016 IBM Corporation
Performance considerations
 Issue with this formulation:
with SimpleTimer("TEU EQUATIONS-3", print_details=False):
trans_df['week'] = trans_df.apply(lambda row: w[row.date], axis=1)
for l in leg_ids:
for we in weeks:
slice_df = trans_df.loc[(trans_df.leg_id == l) & (trans_df.week == we)]
mdl.add_constraint(
leg_teu[(l, we)] == mdl.sum(t.trans * size[t.eqc]
for t in slice_df.itertuples()))

 Slicing is calculated inside the nested loops


 cost of creating a pandas Data Frame is incurred at each iteration

 Much better strategy:


 Prepare the results of all slicing filters before entering the nested loops
 This can be done thanks to pandas’ groupby and aggregate operations

28 © 2016 IBM Corporation


Performance considerations
 Prepare all results of slicing beforehand:
with SimpleTimer("TEU EQUATIONS-3", print_details=False):
trans_df['week'] = trans_df.apply(lambda row: w[row.date], axis=1)
trans_df['result'] = trans_df.apply(lambda row: row.trans * size[row.eqc], axis=1)

legWeeksMultiIndex = pd.MultiIndex.from_product([leg_ids, weeks], names=["leg_id", "week"])


legWeeksMultiIndex_df = pd.DataFrame(legWeeksMultiIndex.values.tolist(),
columns=["leg_id", "week"])
trans_full_df = legWeeksMultiIndex_df.merge(trans_df, how='left').fillna(0)

trans_sum_grpby = trans_full_df[['leg_id', 'week', 'result']].groupby(['leg_id', 'week']).\


aggregate(lambda x: mdl.sum(x.tolist()))

for l in leg_ids:
for we in weeks:
mdl.add_constraint(leg_teu[(l, we)] == trans_sum_grpby.result[l, we])
--> Elapsed time: 2323 ms

 Based on two pandas operations:


 groupby: split dataset into groups
 aggregate: perform a computation on the grouped data
29 © 2016 IBM Corporation
Performance considerations
 Re-writing using helper methods for generic patterns:
with SimpleTimer("TEU EQUATIONS-3", print_details=False):
trans_df['week'] = trans_df.apply(lambda row: w[row.date], axis=1)
trans_df['result'] = trans_df.apply(lambda row: row.trans * size[row.eqc], axis=1)

trans_sum_grpby = for_cross_prod_sum_by([leg_ids, weeks], trans_df,


['leg_id', 'week'], 'result')

for l in leg_ids:
for we in weeks:
mdl.add_constraint(leg_teu[(l, we)] == trans_sum_grpby.result[l, we])

 To be compared with initial “naïve” slicing formulation:


with SimpleTimer("TEU EQUATIONS-3", print_details=False):
for l in leg_ids:
for we in weeks:
mdl.add_constraint(
leg_teu[(l, we)] == mdl.sum(t.trans * size[t.eqc] for t in trans_df_list
if t.leg_id == l and w[t.date] == we))

 Performance vs readability trade-off

30 © 2016 IBM Corporation


Conclusion
 Python is one of the most relevant tools to easily turn an idea into working code
when dealing with data-wrangling problems, and then visualize their results.

 The exact same code that has been written and tested in a notebook for loading
data, modelling an optimization problem, solving it… can readily be integrated
and executed in a deployed Python environment.

 DOcplex objective: facilitate the diffusion and use of optimization technologies

 DOcplex + pandas: an alternative to specialized modelling languages

 On-going effort for defining “best practices” and patterns to:

 address performance issues

 facilitate formulation of models formulation that is readable and maintainable

31 © 2016 IBM Corporation


Thank you!

Questions/Answers

32 © 2016 IBM Corporation


Legal Disclaimer

• © IBM Corporation 2016. All Rights Reserved.


• The information contained in this publication is provided for informational purposes only. While efforts were made to verify the completeness and accuracy of the information contained
in this publication, it is provided AS IS without warranty of any kind, express or implied. In addition, this information is based on IBM’s current product plans and strategy, which are
subject to change by IBM without notice. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this publication or any other materials. Nothing
contained in this publication is intended to, nor shall have the effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and
conditions of the applicable license agreement governing the use of IBM software.
• References in this presentation to IBM products, programs, or services do not imply that they will be available in all countries in which IBM operates. Product release dates and/or
capabilities referenced in this presentation may change at any time at IBM’s sole discretion based on market opportunities or other factors, and are not intended to be a commitment
to future product or feature availability in any way. Nothing contained in these materials is intended to, nor shall have the effect of, stating or implying that any activities undertaken by
you will result in any specific sales, revenue growth or other results.
• If the text contains performance statistics or references to benchmarks, insert the following language; otherwise delete:
Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will
experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage
configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.
• If the text includes any customer examples, please confirm we have prior written approval from such customer and insert the following language; otherwise delete:
All customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs
and performance characteristics may vary by customer.
• Please review text for proper trademark attribution of IBM products. At first use, each product name must be the full name and include appropriate trademark symbols (e.g., IBM
Lotus® Sametime® Unyte™). Subsequent references can drop “IBM” but should include the proper branding (e.g., Lotus Sametime Gateway, or WebSphere Application Server).
Please refer to http://www.ibm.com/legal/copytrade.shtml for guidance on which trademarks require the ® or ™ symbol. Do not use abbreviations for IBM product names in your
presentation. All product names must be used as adjectives rather than nouns. Please list all of the trademarks that you use in your presentation as follows; delete any not included in
your presentation. IBM, the IBM logo, Lotus, Lotus Notes, Notes, Domino, Quickr, Sametime, WebSphere, UC2, PartnerWorld and Lotusphere are trademarks of International
Business Machines Corporation in the United States, other countries, or both. Unyte is a trademark of WebDialogs, Inc., in the United States, other countries, or both.
• If you reference Adobe® in the text, please mark the first use and include the following; otherwise delete:
Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other
countries.
• If you reference Java™ in the text, please mark the first use and include the following; otherwise delete:
Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both.
• If you reference Microsoft® and/or Windows® in the text, please mark the first use and include the following, as applicable; otherwise delete:
Microsoft and Windows are trademarks of Microsoft Corporation in the United States, other countries, or both.
• If you reference Intel® and/or any of the following Intel products in the text, please mark the first use and include those that you use as follows; otherwise delete:
Intel, Intel Centrino, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States
and other countries.
• If you reference UNIX® in the text, please mark the first use and include the following; otherwise delete:
UNIX is a registered trademark of The Open Group in the United States and other countries.
• If you reference Linux® in your presentation, please mark the first use and include the following; otherwise delete:
Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Other company, product, or service names may be trademarks or service marks of
others.
• If the text/graphics include screenshots, no actual IBM employee names may be used (even your own), if your screenshots include fictitious company names (e.g., Renovations, Zeta
Bank, Acme) please update and insert the following; otherwise delete: All references to [insert fictitious company name] refer to a fictitious company and are used for illustration
purposes only.

34 © 2016 IBM Corporation

You might also like