Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

Article

In the format provided by the authors and unedited.


SUPPLEMENTARY INFORMATION
https://doi.org/10.1038/s41586-019-1157-8
https://doi.org/10.1038/s41586-019-1157-8

All-optical spiking neurosynaptic


networks with self-learning capabilities
J. Feldmann1, N. Youngblood2, C. D. Wright3, H. Bhaskaran2 & W. H. P. Pernice1*

1
Institute of Physics, University of Münster, Münster, Germany. 2Department of Materials, University of Oxford, Oxford, UK. 3Department of Engineering, University of Exeter, Exeter, UK.
*e-mail: wolfram.pernice@uni-muenster.de

N A T U R E | www.nature.com/nature
Supplementary Materials for

All-optical spiking neurosynaptic networks with


self-learning capabilities
J. Feldmann1, N. Youngblood2, H. Bhaskaran2, C.D. Wright3 and W.H.P. Pernice1,*

1
Institute of Physics, University of Muenster, Heisenbergstr. 11, 48149 Muenster, Germany
2
Department of Materials, University of Oxford, Parks Road, OX1 3PH Oxford, UK
3
Department of Engineering, University of Exeter, Exeter, EX4 QF, UK
*
Correspondence to: wolfram.pernice@uni-muenster.de

Table of Contents

1. Measurement Setup
2. Operation principle of the fabricated neural networks
3. Phase-change material characterization
4. Weighting the inputs
5. On-chip wavelength multiplexer based on ring resonators
6. Characteristics of the neuronal PCM-cell
7. Directional couplers
8. Reprogramming the weights of a neuron
9. Power and energy considerations
10.Simulations – Unsupervised learning
a. Learning a single pattern
b. Learning more than one pattern
c. Digit recognition and correlation detection
11.Simulation of a simple hidden-layer network
1. Measurement setup

The experimental setup used to characterize the on-chip devices used for the single neuron
experiments (Figures 1-3 of the main paper) is shown in Fig. S1 and comprises the pattern
generation (blue lines) and read-out of the individual weights (red lines). The device circuit to
be read out is shown in yellow on the top, including the input/output waveguide ports. GST
elements are indicated by the red squares.

Fig. S1. Experimental setup for single neuron experiments. Input pulses are prepared
using computer-controlled CW-Lasers and electro-optical modulators (EOM). Via a
de-multiplexer the light is guided to the input couplers of the on-chip device.
Continuous wave probe light is used to read out the phase-change weights. Four laser
sources are used to set the weights, while two additional laser sources are needed for
readout of the weights and the reset operation.
The setup is operated at telecommunication wavelengths using tunable continuous wave (CW)
laser sources and electro-optical modulators (EOMs) for generating desired pulse sequences.
The individual laser sources for pump and probe pulses are spectrally aligned to the resonance
wavelengths of the ring resonator based on-chip multiplexer. Suitable pulse sequences are
imprinted onto the continuous wave light signals via pulse generators as described in the
Methods section of the main text. The transmitted signals are recorded with photodetectors
D0-D4. The weights can be set using the same optical path as used for the input patterns by
increasing the pulse power adjusting the EDFA (erbium-doped fiber amplifier). Optical
circulators are used to route the light going onto and coming from the chip.

To return the neuronal PCM on the ring resonator (left-hand side) to its initial state after
switching, the same laser as for probing the output of the neuron is applied. In case of the self-
learning experiments (Figure 3 of the main text) the output pulse from the activation ring
resonator is amplified off-chip and sent to the optical circulator inputs labeled with F1-F4. By
keeping the optical path of the amplified output pulses as short as possible, they can overlap
with the input pattern and induce the weight update. The energies are chosen such that a
single input pulse decreases (crystallizes) the weight a little and the two overlapping pulses
amorphizes the phase-change material.

Note that the polarization controllers are needed because the EOMs as well as the grating
couplers are polarization dependent.

The expanded experimental setup used for the single layer network experiments (Fig. 5) is
depicted in Fig. S2. Input patterns are generated through several lasers on different
wavelengths. The off-chip multiplexer is used as the collector (see Fig. 4). After generating
the input pulse using an EOM and a pulse generator the pattern is coupled to the on-chip
device through the left grating coupler (“In”) and distributed to the individual neurons on-chip
via the upper row of ring resonators.

Setting the weights is accomplished by splitting the light from an additional tunable laser into
four parts with a 1x4 fiber splitter. Each of these parts is only guided to one of the neurons
(ports “W1”-“W4”) which can be selected by turning on and off the variable optical
attenuators (VOA). Setting the weights of the left-most neuron is done by turning on the three
attenuators on the right and sending the switching pulse only to the second left grating coupler
(“W1”). Using a directional coupler, the light is guided to the ring resonators of the neuron.
By tuning the wavelength of the weight-setting laser, all synapses can be set individually.
Fig. S2. Expanded experimental setup for testing the neural network. The setup consists
of three main parts: generating the input patterns, setting the weights (synapses) and
probing and biasing the output unit. (Note that the top part in contrast to Figure 5 of
the main article shows the actually fabricated device design. Figure 5a in the main
text was rearranged to give a more intuitive view of the signal flow.)

The third part (red) of the setup is used for generating the output pulses. Light is sent to the
ring resonators of the activation units of each neuron (ports “P1”-“P4”) and the output is
measured in the same way as for the single neurons (detectors at ports “O1”-“O4”). The same
input couplers can be used for biasing the phase-change material on the ring-crossing to
reduce the input energy needed for getting an output pulse. Therefore, the probe laser is
amplified (“bias input” path) using the same amplifier as for the weight setting. By increasing
the bias power, also returning the phase-change materials to their initial state after each
readout can be performed using the ports “P1”-“P4”. When generating the output pulses, the
EOM and EDFA are turned off to ensure that the probe pulse does not switch the activation
unit.

2. Operation principle of the fabricated neural networks

An optical micrograph of the device used for the four-neuron experiment in Fig. 5 is shown in
Fig. S3). It consists of the distributor (upper row of rings), the synapses (15 per neuron
between the ring resonators), the summing multiplexer (second row of rings), and the
activation unit (large ring resonators).

Fig. S3. Optical images of fabricated devices. a) Optical micrograph of a fabricated device
with 4 neurons and 60 synapses. b) Magnification of one of the neurons. c)
Activation unit with PCM on the crossing.

All input patterns are sent to the left-most grating coupler (“In” in Fig. S3 b) and equally
distributed to the four neurons (see section S5). After demultiplexing and weighting the inputs
they are summed using a multiplexer and sent to the activation unit comprised of the large
ring resonator with a PCM on the crossing (Fig. S3 c). Its state can be observed by a
transmission measurement between ports “P1” and “O1”.

For setting the weights an additional port is added to each neuron (port “W1” for neuron 1),
which is attached to the waveguide carrying the weighted sum by a directional coupler. The
splitting ratio is set to be 80:20 with 80% of the input sent to the activation unit. Through this
additional port, all synapses can be set individually choosing the corresponding resonance
wavelength.

3. Phase-change material characterization

The phase-change material used in this work is GST and is sputter deposited as described in
the methods section. The real (n) and imaginary part (k) of the refractive index of the material
is measured via ellipsometry and shown in Fig. S4.

Fig. S4. Refractive index of the sputter deposited GST obtained via ellipsometry.

The experiments were performed at wavelengths around 1550 nm, where especially the
imaginary part of the refractive index (and therefore the absorption) changes by an order of
magnitude between the crystalline and amorphous phase.

4. Weighting the inputs

In Fig. 3 of the main paper, a typical learning curve of a neuron is shown where a synapse
input is completely depressed after nine epochs. By changing the pulse energies of input and
feedback pulses, the amount of depression per epoch can be adjusted. Fig. S5 shows two
different energy settings applied to the same phase-change synapse. Setting A is equal to the
settings used in the main paper and therefore approximately the same number of pulses is
needed for recrystallisation. In Setting B the pulse energies were adjusted to have a more
continuous weight updating scheme that takes about 65 epochs till the synapse is completely
depressed.

In this work 3 µm long cells of the phase-change material GST were used as weights, yielding
a contrast of up to 51 % in the examples shown. However, when operating the neuron in a
supervised mode instead of unsupervised weight update using overlapping pulses, an even
higher contrast could be achieved with the same phase-change cell, as the pulse sequence used
to set the synaptic strength could be specifically designed for this purpose. Increasing the
lateral scale of the PCM-cell also leads to higher contrast.

Fig. S5. Unsupervised weight update with different energy settings. Adjusting the pulse
energies between settings A and B leads to different weight depression speeds.

5. On-chip wavelength multiplexer based on ring resonators

Because the input spikes of the artificial neuron are sent on different wavelengths, a
multiplexer is needed to combine the light pulses onto one waveguide and guide it to the on-
chip neuron soma. In order to do so we use ring resonators in an add-drop configuration (see
Fig. 1c of the main paper). A first constraint to the design is that the resonances of the
individual rings for each input should not overlap. Otherwise the light could first couple to the
bus waveguide, but then again couple out to the next ring and would not reach the neuronal
phase-change material (PCM)-ring. To adjust the resonance wavelength, the ring radii are
slightly varied resulting in different resonance conditions for each input. Figure S6a shows the
transmission spectrum of a multiplexer with fifteen inputs as used for the experiments
reported in the main paper. It can be seen that for each input a resonance in the desired
wavelength regime around 1550 nm can be found that has almost no overlap with the others.
The resonances are spectrally separated by about 800 pm, corresponding to the channel
spacing of the off-chip multiplexer. Further improving the photonic circuitry, especially the
waveguide losses, would lead to narrower resonance peaks and an even more compact
spacing would be possible. Note that the ring radii were chosen to be around 30 µm leading to
a free spectral range of 6.25 nm, meaning that in the operated wavelength range from 1545
nm – 1555 nm each ring has at least two resonances.

Fig. S6. On-chip multiplexer. a) Transmission spectra (through port) of the ring resonators
used for multiplexing. For each input ring of the resonator a wavelength can be found
showing only a small overlap with the other ring resonances. b) Calibration of the
radius variation between individual ring resonators to fine tune the resonance
spacing.

As the fifteen resonators of a neuron need to have different resonant wavelengths, the radii
have to be tuned accordingly. From the resonance condition 2πr = n λ/ng with the radius r,
a positive integer n, the resonant wavelength λ and the group index 𝑛𝑔 , the difference in the
ring radius can be derived as Δ𝑟 = 𝑟1 Δλ/λ1, with the radius of the first ring being 𝑟1, the
resonance shift Δλ and the resonance wavelength of the first ring λ1 . To account for errors in
fabrication the according radii were calculated for several spacings and the resulting actual
spacing after fabrication obtained in a transmission measurement (Fig. S6b). From the linear
relation, the radius variation between the individual rings can be calculated for a certain
wavelength spacing.
A second constraint for the single neuron experiments (Figures 1-3 of main paper) is to
maximize the transmission from the input port to the bus waveguide. This efficiency can be
tuned by adjusting the gaps between input waveguide and ring as well as between ring and
drop waveguide. As given by theory for a certain propagation loss factor A in the ring, a
configuration of transmission coefficients t 1 and t 2 can be found where the drop port
transmission is maximal: A = |t 1 |/|t 2 |. The transmission coefficients describe in this case how
much light is transmitted from the input to the through port and from the add to the drop port,
respectively.

Figure S7a shows an example where the input gap is kept constant and the output gap is
varied. It can be concluded that a symmetric configuration (input gap = drop port gap) leads
to the best results and is also observed for the other input gaps tested. Because the waveguide
losses in silicon nitride are quite low, the loss factor A is almost equal to 1 which also points
to an almost symmetric coupling. Therefore, the symmetric configuration was chosen for all
these experiments.

Fig. S7. Drop port transmission for different add-drop configurations. a) Drop port
transmission as a function of the drop port gap at a fixed input gap. The maximal
transmission is achieved for a symmetric configuration. b) Drop port transmission
for different input gaps in a symmetric configuration.

The maximum drop port transmission as a function of the gap is depicted in Fig. S7b. The
highest transmission of about 75% is reached with a gap of 200 nm. However, for the
multiplexers used in this work a gap of 100 nm was used, yielding an even higher drop port
transmission.

For the multi-neuron experiments, a third constraint arises because the coupling efficiency of
the resonators has to be tuned to equally distribute the input power to all neurons of a layer.
This is achieved by adjusting the configuration of the input and drop port gaps according to
Fig. S8.

Fig. S8. Loss and splitting ratio of an add-drop resonator for different configurations.
The contour lines represent the splitting ratio between through and drop port, with
0.0 meaning that all light is transmitted to the through port and 1.0 meaning full
transmission to the drop port. The colors depict the loss.

Neuron Desired Gap A Gap B Measured Loss Resulting


position i (nm) (nm) coupling (%) transmission to
Coupling
neuron
𝒄𝒊

1 0.25 400 100 0.165 9.3 0.15 ⋅ 𝑃in

2 0.33 350 100 0.228 10.9 0.15 ⋅ 𝑃in

3 0.5 350 150 0.373 15.2 0.16 ⋅ 𝑃in

4 1.0 200 150 0.823 29.0 0.16 ⋅ 𝑃in


Table 1. Resonator configurations used in the experiment to achieve equal distribution of the
input pulses to all neurons with 𝑃in being the input power.

As explained in the main text, the splitting ratio c (coupling efficiency) can be calculated from
𝑐𝑖 = 1/(𝑁 + 1 − 𝑖), with the neuron count of the layer N and the position of the neuron i
(𝑖 ∈ (1. . 𝑁)). For the experiments with four neurons this led to the resonator configurations
given in table 1. It can be seen that the resulting fraction of the input pattern transmitted to
each neuron is 15% - 16 %, yielding equal distribution of power. The difference between the
desired coupling and the measured coupling results from on-chip losses, which have to be
included when calculating the power distribution.

6. Characteristics of the neuronal PCM-cell

The last part of the neuron is the neuronal PCM-cell. Here, based on the weighted and
summed light, the decision whether a spike is generated is made. In our work we use a phase-
change cell deposited on top of a ring resonator (Fig. 2b in the main text) to obtain an
activation function as shown in Fig. 2d. The phase-change cell can further be switched
wavelength independently through an extra waveguide (the output waveguide of the
multiplexer). The basic characteristics of such a device in different states are shown in Fig.
S9.

Without a phase-change cell on top the ring resonator exhibits Q-factors of up to 50000,
showing that the waveguide crossing inside the ring only introduces a small amount of optical
loss and can be neglected compared to the results with a phase-change cell in its amorphous
(as deposited) or crystalline state. As the absorption of light from the phase-change cell
increases from the amorphous to the crystalline state of matter, the Q factor drops
accordingly.
The plot of the extinction ratio as a function of the gap shown in Fig. S9 b is used to
determine the gap which leads to the highest contrast between the firing and not firing state of
the activation unit. As the crystalline state is the initial state of the unit, the best contrast can
be achieved when operating the resonator at a gap with the highest extinction ratio, as small
changes in the propagation losses have the biggest effect on the overall transmission here.
This is due to a change in the coupling regime of the resonator. At the point of highest
extinction ratio, the ring is critically coupled to the feeding waveguide and the transmission is
minimal. By amorphizing the phase-change material in the ring now, the critical coupling
regime is shifted towards larger gaps (amorphous curve in Fig. S9 b), yielding higher
transmission. For the neurons used in this work we choose gaps in the range of 50 nm to 100
nm.

Fig. S9. Characteristics of the neuronal PCM-cell in its different states. a) Quality factor
of the ring resonator as a function of the gap between ring and waveguide. As
expected the quality factor decreases with increasing loss per round trip when the
PCM cell is switched from amorphous to crystalline. The waveguide crossing
introduced in the ring resonator still enables quality factors up to 50,000. b)
Extinction ratio plotted versus gap size. For the crystalline phase-change material,
the highest extinction ratio reaches up to 30 dB and therefore critical coupling is
found at a gap of around 70 nm.

7. Directional couplers

As described in section S2, directional couplers were used to add configurability of the
synapses. The directional couplers allow for programming the synapses with an additional
grating coupler port in the reverse direction (coupler port #2 in Fig. S3). The coupler design
used is shown in the inset in Fig. S10 and consists of a feeding waveguide with an adjacent
coupling waveguide, separated by a fixed coupling gap of 500 nm. By varying the coupler
length l, the splitting ratio between the output ports 1 and 2 can be adjusted. As most of the
light from the neuron should be transmitted to the activation unit, a splitting ratio of 80:20
was chosen yielding a coupling length of 24 µm. The insertion loss of these couplers was
approximately 0.07 dB.
Fig. S10. Splitting ratio of the directional couplers as a function of the coupler length.
Error bars are derived from the standard deviation over five devices per coupling
length.

8. Reprogramming the weights of a neuron

Fig. S11 shows the capability of the phase-change synapses to adapt to different patterns. First
the fifteen synapses of the neuron were trained to recognize the pattern “A” (Fig. S11 a),
therefore the corresponding weights have been potentiated by approximately 60% compared
to the initial transmission of the synapse. Contrast values are indicated in blue and red colors
in each pixel. The blue curve in Fig. S11 b represents the transmission spectrum of the
weighting unit of the neuron after programming. Each peak corresponds to a ring resonator
and therefore a pixel of the input pattern. Compared to the initial spectrum (red line) it can be
seen that the transmission on the five wavelengths corresponding to the pixels of the pattern
“A” have been increased.
In a second step the weights were reset, also using optical pulses leading to the transmission
spectrum shown by the orange line depicting that the initial state is almost completely
restored. Now the same neuron was programmed to a different pattern (the number “3”) and
the result is shown in Fig. S11 c). The neuron successfully re-learned a different pattern,
showing that the PCM-neurons are reconfigurable and can be reused for different tasks after
training them once.

Fig. S11. Reprogramming of weights. a) Synapses programmed to recognize the letter “A”.
b) Transmission spectra of the weighting unit in its initial state with crystalline PCM-
cells, after programming the letter “A” and after a reset. c) Reprogramming of the
same neuron to the number “3”.

9. Power and energy considerations

A detailed overview of the energies involved in the operation of the photonic neural network
described in the main text is provided in this section. Because the phase-change materials
employed in the photonic neural network are non-volatile, no continuous power supply is
needed during operation to maintain their state. Therefore, the only energies that are
consumed are needed for switching of the activation unit (and for returning it to its initial
state) and for programming the weights.

A switching event in the activation unit is triggered by the optical input pattern sent to a
desired neuron. Corresponding to the experiment shown in Fig. 5, each input pattern consists
of five optical pulses. In our experiments, the pulse energy inside the input waveguide (for
one out of the five pulses) was approximately 700 pJ and the propagation through the device
is examined in the following. The input energy is equally distributed to all four neurons by the
on-chip distributor. Using the measured parameters of the distributor as shown in Table 1 in
section 5, the power at the PCM synapse (after the first ring resonator) is 700 pJ ⋅ 0.907 ⋅
0.165 ≈ 105 pJ. Assuming an amorphous PCM-cell and therefore neglecting the optical
losses due to absorption, the energy per pulse guided to the activation unit after the second
resonator (including 30% insertion loss in a symmetric configuration with a gap of 150 nm) is
105 pJ ⋅ 0.7 ⋅ 0.8 ≈ 60 pJ. The factor of 0.8 arises from the directional coupler used for
setting the weights which was set to an 80:20 splitting ratio as outlined in section 7. Because
five pulses were used per pattern, this leads to an activation energy of approximately 5 ⋅
60 pJ = 300 pJ, considering amorphous weights when the neuron is trained to this pattern.

In order to estimate how much energy is needed to induce a switching event in a 3 µm PCM
cell, the energies needed for setting the weights can be considered. After splitting the input
pulse (with a length of 200 ns) to the four neurons and taking into account coupling losses on
chip, the pulse energy in the waveguide is approximately 4.7 nJ. Because 80% of the light is
lost in the directional coupler (for setting the weights the 20 % port is used) and 30% at the
add-drop resonator (same resonator with a symmetric configuration and 150 nm gaps as
before), the energy arriving at the PCM-cell is 4.7 nJ ⋅ 0.2 ⋅ 0.7 ≈ 660 pJ. This energy level
induced a contrast of around 60% in the synapse starting from the fully crystalline state.

By comparing the 660 pJ required for weight adjustment to the 60 pJ per pulse in the input
pattern, it is apparent that the input pulses to the neuron do not have enough energy to switch
the PCM-synapse. On the other hand, the total pulse energy of the input pattern arriving at the
activation PCM on the ring (300 pJ) is not enough to induce amorphization, raising the need
for biasing the output ring resonator with a similar energy.

Returning the activation unit to its initial state was achieved in an unoptimized scheme using
several pulses (≈ 5) with decreasing energy between approximately 300 pJ and 100 pJ
leading to a total pulse energy for this process of around 1.0 nJ.

The operation of a complete cycle of a neuron therefore consists of the pulse energies in the
input pattern and the re-initialization process (5 ⋅ 105 pJ + 1000 pJ = 1.5 nJ). For the neural
networks with four neurons presented in this work this leads to a power consumption of
5 ⋅ 700 pJ + 4 ⋅ 1000 pJ = 7.5 nJ per cycle. The difference between the energies of a single
neuron and the network arises from losses in the distributor and therefore depends on the
specific network structure.

The pulse energies given above are approximated values because they vary depending on the
overlap of the ring resonances of the multiplexers and variations in the experimental setup. An
approximate error in energy can be deduced from the error bars given for the change of the
output amplitude in the experiment shown in Fig. 5 which are up to 12% of the contrast value.
As these variations are mainly caused by varying pulse energies an error of 10% - 15% can
also be derived for the pulse energies.
We note that the pulse energies used in these experiments can be reduced substantially by
using shorter pulses. In our case 200 ns pulses were used because of restrictions in the
experimental setup. By employing short pulses with a width of 1 ps we have already shown in
previous works that the switching energies can be reduced to 20 pJ, thus improving the
overall efficiency by a factor of 30. Further optimization of the photonic structures, especially
reducing losses in the multiplexers, similarly would lead to a better performance in terms of
energy consumption.

10. Simulations – Unsupervised learning

Since the experimental demonstrations in this work were restricted to neurons with up to 15
inputs due to practical limitations in the optical setup, simulations were performed to analyze
the neuron implementation with more inputs and to further investigate the learning method.
The simulated neurons consist of 4096 pre-synaptic input neurons connected via PCM-
synapses to one post-synaptic neuron in order to recognize 64x64 pixel images. Each pixel is
mapped to one input waveguide.

Fig. S12. Experimental data of the activation unit and the linear fits used for
simulations.
As in the experiment, the weight depression from the fully amorphous to the crystalline state
is divided into ten steps. Therefore, amorphizing the phase-change cell takes place in a single
step. The activation function used is extracted from the experimental data and resembles the
rectified linear unit function shown in Fig. S12. The red and blue lines indicate fit curves
which were then used in the simulations.

a. Learning a single pattern

In a first step, the neuron is trained to recognize a single pattern, similar to the experimental
demonstration. Prior to training the neuron, it has to be initialized. In the simulations both
fully amorphous starting weights and randomly initialized weights are used. The neuron is
then randomly shown a 64x64 pixel image (see inset of Fig. S13) or a noise pattern (with a
ratio of 50:50) and the output is observed. The value of each pixel (black or white) of the
pattern is sent to one of the 4096 input neurons of the network.

Fig. S13. Learning a single pattern with one neuron and 4096 inputs. a) Randomly the
WWU pattern or noise are shown to the neuron. For amorphous starting weights as
well as randomly initialized weights the neuron has adapted to the pattern after 18
epochs. b) and c) show the neuron output resulting from the WWU-pattern and a
random pattern after each epoch for amorphous (a) and random starting weights (b).

Figs. S13b-c depict the training progress and Fig. S13a shows the weight development over
18 epochs. For both initialization routines the weights of the neuron adapt to the pattern until
it finally fires only when the learned pattern is shown.
b. Learning more than one pattern

Figure S14 shows the results for a fully connected network consisting of 4096 input layer
neurons and two output neurons. The aim is to train neuron one to recognize the ‘WWU’
pattern and neuron two recognize to the ‘smiley’ pattern. Therefore, the network was shown
pattern one (‘WWU’) and pattern two (‘smiley’) in 10% of the cases each, and a random noise
pattern in the remaining 80%. The noise is used to depress weights that were wrongly learned.

Fig. S14. Unsupervised learning of two patterns. a) The first example illustrates correct
learning of the two patterns. Each neuron is specialized to one of the patterns (smiley
or WWU). In b) the same network was initialized with other starting weights and the
sequence of patterns shown to the network was changed. In this training run, neuron
two adapted to both patterns.

The first run exhibits the correct results, where the two output neurons have learned the two
different patterns and neuron one only fires when the first pattern is shown while neuron two
only fires when the second pattern is shown.

The second run illustrates a well-known problem of unsupervised learning techniques called
co-specialization. In this case both neurons adopted to the ‘WWU’ pattern, as the outcome of
the learning routine depends on the starting conditions and in which sequence the different
patterns are shown. To prevent the neurons from learning the same patterns, inhibitory
connections between the neurons can be introduced, implementing a ‘winner-takes-all’ rule.
In this case only the neuron with the highest output is allowed to update its weights.

c. Digit recognition and correlation detection.


In a further simulation, the inhibitory connections were implemented and a fully connected
network consisting of 4096 input- and ten output neurons was simulated in order to train the
network to recognize the digits 0-9. Starting from fully amorphous weights, after 100 epochs,
each neuron is specialized on one of the digits and because of the ‘winner-takes-all’ rule, none
of the digits is learned twice (see Fig. S15a).

One interesting feature using only positive weights, as the implementation of weights with a
single phase-change cell defines, is depicted in Fig. S15b-c.

Fig. S15. Unsupervised digit recognition. a) Implementing the ‘winner-takes-all’ rule a


network containing ten output neurons was trained to the digits 0-9. b) and c) show
the outputs of neuron #1 and #3 responding to the different patterns during the
learning period. The height of the output is a measure for the similarity between the
patterns.

The output of a single neuron when seeing the individual digits depends on the similarity
between the inputs. In case of the neuron that learned to recognize ‘5’ (see Fig. S15b) the next
highest outputs are generated for the digits ‘6’ and ‘8’ as the overlap between these images is
the biggest. On the other hand, looking at the neuron that specialized to ‘1’ (Fig. S15c) the
outputs for all the other digits are negligible, as the image ‘1’ shares only a few pixels with
the other digits. This way the relative heights of the neuron outputs for different patterns can
be used to detect correlations and similarities between them.
11. Simulation of a simple hidden-layer network

In order to test the neural network structure proposed in the main text, simulations with a
hidden layer were carried out. The network consists of four input neurons, three hidden layer
neurons and two output neurons (Fig. S16 a). The network is built using the scaling
architecture described in the main text in Fig.4. The network is used to detect if the language
of a given input text is either English or German. Input data is taken from [1]

Fig. S16. Simulation of language detection with a hidden layer. a) The network used for
language detection consists of five input neurons for the five vowels, three hidden
layer neurons and two output neurons. Next to the schematic, the corresponding
photonic circuit diagram is shown. b) Accuracy of the network in identifying the
language of a certain input text as a function of the number of words.
Supplementary references

[1] D. Goldhahn, T. Eckart & U. Quasthoff: Building Large Monolingual Dictionaries at the
Leipzig Corpora Collection: From 100 to 200 Languages.
In: Proceedings of the 8th International Language Ressources and Evaluation (LREC'12),
2012

You might also like