Professional Documents
Culture Documents
WQP GRN Onboarding Document - How To Use Our Platform
WQP GRN Onboarding Document - How To Use Our Platform
Research Problems
Typical Workflow
Research problems are typically onboarded and addressed in the following fashion:
5. Researchers develop models using only the researcher data, and testing
against the researcher test inputs. When they feel the models are ready, they
can submit the models for production where their models are fitted using
MTE data and tested against the MTE input.
6. Research leads would evaluate submitted models and promote a model into
production if it has sufficient predictive power and meet the other required
criteria on latency, etc.
7. An ensemble model would be created using some or all of the production
models. This ensemble model is refreshed periodically with new production
models.
8. Internal or client deliverables would be constructed from the ensemble
model output.
Note: to use the wqpt commands on Windows, you need to use cygwin to have an
emulated linux command prompt. On older images (<v1.3) the default Cygwin path
is not on D: drive, so make sure you are working on drive D: (run pwd and look for
/cygdrive/d in the output). (If not, use cd D:). Suggested to create a work folder
(mkdir work) and go to that directory (cd work).
Run the cygwin terminal and check whether the anaconda is properly configured.
For this run the following command:
which python
The expected output should show the executable from the Anaconda folder. If it
does not use anaconda’s python executable, contact the support engineer for help.
The wqpt utility should be installed already. To configure it run the following
command:
where the <token> should be replaced by your authentication token (the link to
your auth token is distributed through slack personally).
If needed, you can update the wqpt package by the running wqpt-update
command. This document assumes wqpt version >= wqpt (version 2.0.0+2887).
You can check your version by running wqpt --version.
Platform Primer
Creating Models
You can set up a model skeleton for a given problem locally using:
NOTE: If you try out the platform, please use demand-multivariate as playbook,
e.g.:
wqpt create demand-pred-test01 --playbook=demand-multivariate
--lang=py
Example:
.wqpt/Playbook.yml
Example:
Alphafile.yml
The Alphafile.yml file contains metadata for the model. Currently, it has three main
sections:
- runtimeVersion which indicates the platform runtime version this model was built
with
- description should contain the introduction of your alpha: the used methods and
main steps of the algorithm. Fill it before submitting your alpha.
- alpha which contains information about the model.
● name: This is the of the model which is used to identify the model in the catalog.
● playbook: This is the name of the playbook (i.e. corresponding to a research
problem) in the catalog that this model addresses.
● version: This is the version of the model. If model code is refreshed, this should be
incremented.
● runtime: This is the runtime used by the model, and is dependent on the
implementation language specified during model skeleton creation.
● entrypoint: This is the entry point into the model, i.e. the root file containing the
model, and is typically alpha.py for Python models or alpha.R for R models.
Example:
alpha.py or alpha.R
The alpha.py (or alpha.R) file is the entry point into your model if you are using
Python (or R) as the implementation language. Your model should implement an
interface with three public functions:
Note that the model should persist its state between calls. Additionally, it is
recommended to implement predict_batch rather than predict to reduce
network overhead and leverage the ability of most standard ML packages to make
vectorized predictions, which is much faster than one at a time. The platform will
preferentially use predict_batch over predict if both functions are
implemented.
Examples:
environment.yml
The environment.yml file lists the package dependencies of your model. There
should be one entry per line, and each entry should be the name of a package from
Conda, PyPI (Python model) or CRAN (R model).
Example:
● Python model
The custom.py file contains code to define and register custom statistical modules
that compute metrics aside from the pre-defined ones. To use a registered stats
module, it has to be added to the statistics section of the .wqpt/Playbook.yml
file. Note that any additional statistical metrics you add to the
.wqpt/Playbook.yml file will only be generated in local tests and not on any
remote ones.
Example:
To add a new custom statistical module, you will need to add or edit the custom.py
file to register the module and edit the .wqpt/Playbook.yml file to add the
registered module to the list of statistical metrics to compute when running
$ wqpt stats
The following file shows how you could define a new custom statistic metric. In
this case, a new statistical module is defined to compute the Root Mean Squared
Logarithmic Error, which we register as RMSLE.
$ wqpt test
Once your model code is ready, you can run a local test using the CLI commands in
the model’s base directory:
$ wqpt test
When your alpha is ready, in the directory containing your alpha, execute the
following:
$ wqpt upload -r
Remember, that for submitting any remote runs you’ll need to be connected to VPN.
Remote test runs using the researcher data set without local
checks
Append -f if some checks (eg. requirements) fail, but should run on the server:
$ wqpt upload -r -f
When your alpha is ready, in the directory containing your alpha, execute the
following:
$ wqpt upload
Remote test runs using the reserved MTE data set without
local checks
Append -f if some checks (eg. requirements) fail, but should run on the server:
$ wqpt upload -f
Append -m and a short reminder about changes of your alpha since the last
submission. Think of it as a changelog message, which helps you to identify
versions.
Test feedback
Model performance is displayed in the console when running local tests, but you can query
the results again any time by executing the following command in the directory containing
your alpha:
$ wqpt stats
Capture returns a JSON serializable object. If the capture function is not defined within
alpha then runtime will substitute it with a function that returns None.
Restore function accepts a hashable object and returns True if state is restored and false
otherwise. If restore function is not defined within alpha then runtime will substitute it
with a function that returns false.
Introductory Tasks
Try building a model for one of the established problems
on the platform
The goal of this task is to help establish your own research workflow and
familiarize yourself with the platform, especially the wqpt command line tool.
There are a number of problems that can be tackled, but it is recommended to try
the following one as this problem is easy to understand and well-documented:
● Demand prediction
Overview
We are aiming to create demand predictions on various levels of aggregations (from
UPC (Universal Product Codes) level to subcategory level) in the beverage category
from the given dataset.
Performance
MAPE (Mean absolute percentage error)
metric
Success
Criteria MAPE <30%
Research question
What is the expected demand of known UPCs in function of distribution, price & promo
variables?
Research problem
Given a date, predict the SPPD based on given input variables:
Platform Output
Problem Scope files Input variables (levers) variable
Name (predicted)
Time
demand- Available in files SPPD
● TimePeriodEndDate
multivariate folder after
Product Features (UPC level)
creation of the
● Upc
model in wqpt
Financials
platform
● base_price
● discount_perc
● AvgPctAcv
Promo variables
● AvgPctAcvAnyDisplay
● AvgPctAcvAnyFeature
● AvgPctAcvFeatureAndDisplay
● AvgPctAcvTPR
Calculated fields
# Field Description Formula
[Dollars, Promo]/[Units,
4 promo_price prompted price (TPR or discount)
Promo]
[Dollars, Promo,
5 promo_price_yago year ago prompted price (TPR or discount) Yago]/[Units, Promo,
Yago]
Terminology
Term Description Formula
● BasePrice
price & ● DiscountPct
promo ● AvgPctACV
variables ● AvgPctACVAnyDisplay
● AvgPctACVAnyFeature
● AvgPctACVFeatureAndDisplay
● AvgPctACVTPR