Tim McNamara - A Look at NuPIC - A Self-Learning AI Engine

A look at NuPIC: A self-learning AI engine
Tim McNamara
Disclaimers:
• Personal, not professional interest

• Neither neuroscience, nor machine learning expert
Problem statement:
• New Zealand has excellent environmental sensor data.

Can we build useful applications based these sources
to provide hazard impact guidance?
More concretely:
• Can we predict floods on rivers automatically based on

rain data and historic pattern? Hiring machine learning
expertise to create bespoke models is very expensive.
• Can we get an early hunch for the human impact of an

earthquake from location & magnitude data only?
So
What is NuPIC?
A system for building AI models which
has been built on the basis of neurological
foundations.
Fundamentally, here is the argument:
The neocortex is doing something right.

Let's implement that.
The neocortex appears to have a very regular
structure, irrespective of the learning domain.
What are some features of that structure?
- The neocortex is heavily hierarchical,

with 5 celluar and 1 non-celluar level through most of it.
- Cortical regions appear to appears to use
sparse, distributed representations
to store information.
- Temporality is important, e.g. we
can distinguish two senses of the same input sounds
depending on context: "I ate eight apples."
Motivations for
preparing this talk
1. A system built on self-learning
There is a trade off between how much memory is
allocated to each level [of the model] and how
many levels are needed. Fortunately, HTMs
automatically learn the best possible representations
at each level given statistics of the input
and the amount of resources allocated.
— Numenta White Paper (2011), p 28

Real brains are highly “plastic”, regions of the
neocortex can learn to represent entirely different
things in reaction to various changes. If part of
the neocortex is damaged, other parts will adjust to
represent what the damaged part used to represent.
… The system is self-adjusting.
— Numenta White Paper (2011), p 28

2. Streams are emerging from everywhere.
3. Numenta is being very brave.
NuPIC Concepts
Cortical Learning Algorithm (CLA)
Sparse, distributed representations (SDRs)
Online Prediction Framework (OPF)
Hierarchical Temporal Memory (HTM)
Sparse, distributed representations
Information encoded as a 2048 bit array of
0s and 1s. Within any array, only a small
number of bits will be activated for any
given input. Matching active bits means
that two inputs are similiar.
0100000000000000000000010000000000000000000000000000000000000000010000..........01000
0000000010000000000000010000000000000000000000000010000000000000010000..........00000
Introducing hotgym.py
gym,address,timestamp,consumption
string,string,datetime,float
S,,T,
Balgowlah Platinum,Shop 67 197-215 Condamine Street Balgowlah 2093,2010-07-02 00:00:00.0,5.3
...
S,,T,
...
S,,T,
...
S,,T,
...
S,,T,
...
S,,T,
...
S,,T,
...
S,,T,
...
model = ModelFactory.create(model_params.MODEL_PARAMS)
model.enableInference({'predictedField': 'consumption'})
reader = csv.reader(open(_DATA_PATH))
headers = reader.next()
for i, record in enumerate(reader, start=1):
modelInput = dict(zip(headers, record))
modelInput["consumption"] = float(modelInput["consumption"])
modelInput["timestamp"] = datetime.datetime(
modelInput["timestamp"], "%m/%d/%y %H:%M")
result = model.run(modelInput)


ModelResult(
inferences={
'multiStepPredictions': {
1: {
5.2825868514199987: 0.69999516634971859,
10.699999999999999: 0.07601257054965195,
22.100000000000001: 0.055294648127235196,
22.899999999999999: 0.052690624183750749,
},
5: {
38.188079999999999: 0.2275438176777452,
47.359999999999992: 0.19538808382423584,
37.399999999999999: 0.12597931862094047,
45.399999999999999: 0.099123261272031596,
37.089999999999996: 0.082913215936932752,
39.280000000000001: 0.077935781935515161,
43.629999999999995: 0.076405289164189288
}
},
'multiStepBestPredictions': {
1: 5.2825868514199987,
5: 38.188079999999999
}
}
...
)
MODEL_PARAMS = {
# Type of model that the rest of these parameters apply to.
'model': "CLA",
# Version that specifies the format of the config.

'version': 1,
# Intermediate variables used to compute fields in modelParams and also

# referenced from the control section.
'aggregationInfo': { 'days': 0,
'fields': [('consumption', 'sum')],
'hours': 1,
'microseconds': 0,
'milliseconds': 0,
'minutes': 0,
'months': 0,
'seconds': 0,
'weeks': 0,
'years': 0},
'predictAheadTime': None,
# Model parameter dictionary.
'modelParams': {
# The type of inference that this model will perform
'inferenceType': 'TemporalMultiStep',
'sensorParams': {
# Sensor diagnostic output verbosity control;
# if > 0: sensor region will print out on screen what it's sensing
# at each step 0: silent; >=1: some info; >=2: more info;
# >=3: even more info (see compute() in py/regions/RecordSensor.py)
'verbosity' : 0,
# Example:
# dsEncoderSchema = [
# DeferredDictLookup('__field_name_encoder'),
# ],
#
# (value generated from DS_ENCODER_SCHEMA)
'encoders': { 'consumption': { 'clipInput': True,
'fieldname': u'consumption',
'n': 100,
'name': u'consumption',
'type': 'AdaptiveScalarEncoder',
'w': 21},
'timestamp_dayOfWeek': { 'dayOfWeek': (21, 1),
'fieldname': u'timestamp',
'name': u'timestamp_dayOfWeek',
'type': 'DateEncoder'},
'timestamp_timeOfDay': { 'fieldname': u'timestamp',
'name': u'timestamp_timeOfDay',
'timeOfDay': (21, 1),
'type': 'DateEncoder'},
'timestamp_weekend': { 'fieldname': u'timestamp',
'name': u'timestamp_weekend',
'type': 'DateEncoder',
'weekend': 21}},
# A dictionary specifying the period for automatically-generated

# resets from a RecordSensor;
#
# None = disable automatically-generated resets (also disabled if
# all of the specified values evaluate to 0).
# Valid keys is the desired combination of the following:
# days, hours, minutes, seconds, milliseconds, microseconds, weeks
#
# Example for 1.5 days: sensorAutoReset = dict(days=1,hours=12),
#
# (value generated from SENSOR_AUTO_RESET)
'sensorAutoReset' : None,
},
'spEnable': True,
'spParams': {
# SP diagnostic output verbosity control;
# 0: silent; >=1: some info; >=2: more info;
'spVerbosity' : 0,
'globalInhibition': 1,
# Number of cell columns in the cortical region (same number for
# SP and TP)
# (see also tpNCellsPerCol)
'columnCount': 2048,
'inputWidth': 0,
# SP inhibition control (absolute value);
# Maximum number of active columns in the SP region's output (when
# there are more, the weaker ones are suppressed)
'numActivePerInhArea': 40,
'seed': 1956,
# coincInputPoolPct
# What percent of the columns's receptive field is available
# for potential synapses. At initialization time, we will
# choose coincInputPoolPct * (2*coincInputRadius+1)^2
'coincInputPoolPct': 0.5,
# The default connected threshold. Any synapse whose
# permanence value is above the connected threshold is
# a "connected synapse", meaning it can contribute to the
# cell's firing. Typical value is 0.10. Cells whose activity
# level before inhibition falls below minDutyCycleBeforeInh
# will have their own internal synPermConnectedCell
# threshold set below this default value.
# (This concept applies to both SP and TP and so 'cells'
# is correct here as opposed to 'columns')
'synPermConnected': 0.1,
'synPermActiveInc': 0.1,
'synPermInactiveDec': 0.01,
},
# Controls whether TP is enabled or disabled;
# TP is necessary for making temporal predictions, such as predicting
# the next inputs. Without TP, the model is only capable of
# reconstructing missing sensor inputs (via SP).
'tpEnable' : True,
'tpParams': {
# TP diagnostic output verbosity control;
# 0: silent; [1..6]: increasing levels of verbosity
# (see verbosity in nta/trunk/py/nupic/research/TP.py and TP10X*.py)
'verbosity': 0,
# Number of cell columns in the cortical region (same number for

# SP and TP)
# (see also tpNCellsPerCol)
'columnCount': 2048,
# The number of cells (i.e., states), allocated per column.

'cellsPerColumn': 32,
'inputWidth': 2048,
'seed': 1960,
# Temporal Pooler implementation selector (see _getTPClass in

# CLARegion.py).
'temporalImp': 'cpp',
# New Synapse formation count
# NOTE: If None, use spNumActivePerInhArea
#
# TODO: need better explanation
'newSynapseCount': 20,
# Maximum number of synapses per segment

# > 0 for fixed-size CLA
# -1 for non-fixed-size CLA
#
# TODO: for Ron: once the appropriate value is placed in TP
# constructor, see if we should eliminate this parameter from
# description.py.
'maxSynapsesPerSegment': 32,
# Maximum number of segments per cell

# > 0 for fixed-size CLA
# -1 for non-fixed-size CLA
#
# TODO: for Ron: once the appropriate value is placed in TP
# constructor, see if we should eliminate this parameter from
# description.py.
'maxSegmentsPerCell': 128,
# Initial Permanence
# TODO: need better explanation
'initialPerm': 0.21,
# Permanence Increment
'permanenceInc': 0.1,
# Permanence Decrement
# If set to None, will automatically default to tpPermanenceInc
# value.
'permanenceDec' : 0.1,
'globalDecay': 0.0,
'maxAge': 0,
# Minimum number of active synapses for a segment to be considered

# during search for the best-matching segments.
# None=use default
# Replaces: tpMinThreshold
'minThreshold': 12,
# Segment activation threshold.

# A segment is active if it has >= tpSegmentActivationThreshold
# connected synapses that are active due to infActiveState
# None=use default
# Replaces: tpActivationThreshold
'activationThreshold': 16,
'outputType': 'normal',
# "Pay Attention Mode" length. This tells the TP how many new
# elements to append to the end of a learned sequence at a time.
# Smaller values are better for datasets with short sequences,
# higher values are better for datasets with long sequences.
'pamLength': 1,
},
'clParams': {
'regionName' : 'CLAClassifierRegion',
# Classifier diagnostic output verbosity control;
# 0: silent; [1..6]: increasing levels of verbosity
'clVerbosity' : 0,
# This controls how fast the classifier learns/forgets. Higher values

# make it adapt faster and forget older patterns faster.
'alpha': 0.0001,
# This is set after the call to updateConfigFromSubConfig and is

# computed from the aggregationInfo and predictAheadTime.
'steps': '1,5',
},
'trainSPNetOnlyIfRequested': False,
},
}
sensorParams
spParams (spatial pooling)
tpParams (temporal pooling)
clParams (cortical learning)
sensorParams
spParams (spatial pooling)
tpParams (temporal pooling)
clParams (cortical learning)
'spParams' {
...
'encoders': {
'consumption': {
'clipInput': True,
'n': 100,
'w': 21
},
'timestamp_dayOfWeek': {
'dayOfWeek': (21, 1),
'type': 'DateEncoder'
},
'timestamp_timeOfDay': {
},
'timestamp_weekend': {
'weekend': 21}
},
...
}
'spParams' {
...
'encoders': {
'consumption': {
'clipInput': True,
'n': 100,
'w': 21
},
},
},
'weekend': 21}
},
...
}
'spParams' {
...
'encoders': {
'consumption': {
'clipInput': True,
'n': 100,
'w': 21
},
},
},
'weekend': 21}
},
...
}
'spParams' {
...
'encoders': {
'consumption': {
'clipInput': True,
'n': 100,
'w': 21
},
},
},
'weekend': 21}
},
...
}
As you can see,
lots of knobs
NuPIC treat: swarming
{
"includedFields": [
{ "fieldName": "timestamp", "fieldType": "datetime" },
{ "fieldName": "consumption", "fieldType": "float"}
],
"streamDef": {
"info": "test",
"version": 1,
"streams": [
{
"info": "hotGym.csv",
"source": "file://extra/hotgym/hotgym.csv",
"columns": [ "*" ],
"last_record": 100
}
],
"aggregation": {
"years": 0, "months": 0, "weeks": 0, "days": 0,
"hours": 1, "minutes": 0, "seconds": 0,
"microseconds": 0, "milliseconds": 0,
"fields": [
[ "consumption", "sum" ],
[ "gym", "first" ],
[ "timestamp", "first" ]
],
}
},
"inferenceType": "MultiStep",
"inferenceArgs": {
"predictionSteps": [
1
],
"predictedField": "consumption"
},
"iterationCount": -1,
"swarmSize": "medium"
}
{
"includedFields": [
],
"streamDef": {
"info": "test",
"version": 1,
"streams": [
{
"columns": [ "*" ],
"last_record": 100
}
],
"aggregation": {
"fields": [
[ "gym", "first" ],
],
}
},
"inferenceArgs": {
1
],
},
}
{
"includedFields": [
],
"streamDef": {
"info": "test",
"version": 1,
"streams": [
{
"columns": [ "*" ],
"last_record": 100
}
],
"aggregation": {
"fields": [
[ "gym", "first" ],
],
}
},
"inferenceArgs": {
1
],
},
}
{
"includedFields": [
],
"streamDef": {
"info": "test",
"version": 1,
"streams": [
{
"columns": [ "*" ],
"last_record": 100
}
],
"aggregation": {
"fields": [
[ "gym", "first" ],
],
}
},
"inferenceArgs": {
1
],
},
}
{
"includedFields": [
],
"streamDef": {
"info": "test",
"version": 1,
"streams": [
{
"columns": [ "*" ],
"last_record": 100
}
],
"aggregation": {
"fields": [
[ "gym", "first" ],
],
}
},
"inferenceArgs": {
1
],
},
}
{
"includedFields": [
],
"streamDef": {
"info": "test",
"version": 1,
"streams": [
{
"columns": [ "*" ],
"last_record": 100
}
],
"aggregation": {
"fields": [
[ "gym", "first" ],
],
}
},
"inferenceArgs": {
1
],
},
}
{
"includedFields": [
],
"streamDef": {
"info": "test",
"version": 1,
"streams": [
{
"columns": [ "*" ],
"last_record": 100
}
],
"aggregation": {
"fields": [
[ "gym", "first" ],
],
}
},
"inferenceArgs": {
1
],
},
}
{
"includedFields": [
],
"streamDef": {
"info": "test",
"version": 1,
"streams": [
{
"columns": [ "*" ],
"last_record": 100
}
],
"aggregation": {
"fields": [
[ "gym", "first" ],
],
}
},
"inferenceArgs": {
1
],
},
}
{
"includedFields": [
],
"streamDef": {
"info": "test",
"version": 1,
"streams": [
{
"columns": [ "*" ],
"last_record": 100
}
],
"aggregation": {
"fields": [
[ "gym", "first" ],
],
}
},
"inferenceArgs": {
1
],
},
}
GNS Science GeoNet
- comprehensive, open data on NZ earthquakes
- accessible via easy, flexible unauthenticated HTTP API
- includes ~50 variables per quake
- includes "felt reports"
Can we predict the likely human impact of a quake
based purely on sensor data?
...
Sadly, I don't know yet.
Problems
- swarming takes a lot of time
- predictedField is singular
- wanted to predict likely values for numbers of
felt reports between Modified Mercali 0-10
NIWA CliFlo
- comprehensive, open(ish) data on NZ weather
- accessible via an easy(ish), HTTP API
- real-time(ish)
Can we predict the likelihood of severe weather event
based on these input streams?
...
I believe so, but...
Problems
• regional council data flood level data is
harder to access that I had anticipated
• some licence uncertainty around CliFlo reuse
Consider these applications
a work in progress!
(Let me know if you would like to help!)

Reflections on NuPIC
• terminology is difficult, but

you'll get there
• lots of tools for building tools
• well documented code
• excellent community
paperless@timmcnamara.co.nz
github.com/timClicks
twitter.com/timClicks

Tim McNamara - A Look at NuPIC - A Self-Learning AI Engine

Uploaded by

Copyright:

Available Formats

You might also like

Tim McNamara - A Look at NuPIC - A Self-Learning AI Engine

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Tim McNamara - A Look at NuPIC - A Self-Learning AI Engine

Uploaded by

Copyright:

Available Formats

A look at NuPIC: A self-learning AI engine

• Personal, not professional interest

• New Zealand has excellent environmental sensor data.

• Can we predict ﬂoods on rivers automatically based on

• Can we get an early hunch for the human impact of an

The neocortex is doing something right.

- The neocortex is heavily hierarchical,

— Numenta White Paper (2011), p 28

— Numenta White Paper (2011), p 28

for i, record in enumerate(reader, start=1):

for i, record in enumerate(reader, start=1):

# Version that specifies the format of the config.

# Intermediate variables used to compute fields in modelParams and also

# A dictionary specifying the period for automatically-generated

# Number of cell columns in the cortical region (same number for

# The number of cells (i.e., states), allocated per column.

# Temporal Pooler implementation selector (see _getTPClass in

# Maximum number of synapses per segment

# Maximum number of segments per cell

# Minimum number of active synapses for a segment to be considered

# Segment activation threshold.

# This controls how fast the classifier learns/forgets. Higher values

# This is set after the call to updateConfigFromSubConfig and is

(Let me know if you would like to help!)

• terminology is diﬃcult, but

You might also like