Non-Boolean Computing With Nanomagnets For Computer Vision Applications

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

ARTICLES

PUBLISHED ONLINE: 26 OCTOBER 2015 | DOI: 10.1038/NNANO.2015.245

Non-Boolean computing with nanomagnets for


computer vision applications
Sanjukta Bhanja1*, D. K. Karunaratne1†, Ravi Panchumarthy2†, Srinath Rajaram1 and Sudeep Sarkar2

The field of nanomagnetism has recently attracted tremendous attention as it can potentially deliver low-power, high-
speed and dense non-volatile memories. It is now possible to engineer the size, shape, spacing, orientation and
composition of sub-100 nm magnetic structures. This has spurred the exploration of nanomagnets for unconventional
computing paradigms. Here, we harness the energy-minimization nature of nanomagnetic systems to solve the quadratic
optimization problems that arise in computer vision applications, which are computationally expensive. By exploiting the
magnetization states of nanomagnetic disks as state representations of a vortex and single domain, we develop a
magnetic Hamiltonian and implement it in a magnetic system that can identify the salient features of a given image with
more than 85% true positive rate. These results show the potential of this alternative computing method to develop a
magnetic coprocessor that might solve complex problems in fewer clock cycles than traditional processors.

N
anomagnetic logic architectures compute information by image, (2) identify all the edge segments and (3) map each edge
minimizing their total magnetization energy. This energy- segment to a unique nanomagnetic disk in the coprocessor. The
minimizing nature of magnetic systems can be directly har- unselected nanomagnetic disks in the array would be driven into
nessed to solve the quadratic optimization problems that arise in a non-computing state. This mechanism isolates a cluster of nano-
many computer vision applications, including motion segmenta- magnetic disks that represent edge segments in an edge image.
tion1, correspondence2, figure–ground segmentation3, clustering4, Finally, the ‘selected’ nanomagnetic disks in the array would be
grouping5, subgraph matching6 and digital graph matching7. In clocked into a computing state and allowed to relax to an energy
this Article we focus on one such vision problem: ‘perceptual organ- minimum. The salient edge segments (or the solution of the
ization’. Solutions to the automated recognition of objects from an entire perceptual organization problem) are obtained by reading
image typically involve three steps: (1) feature extraction, (2) percep- the magnetization states of all the nanomagnetic disks in a parallel
tual organization and (3) object matching8–11 (Fig. 1). Although readout fashion. Here, using a system of nanomagnetic disks that
there are many hardware solutions to speed up the first step, the per- represent the edges, we experimentally demonstrate this computing
ceptual organization and object matching steps are still solved by method (Fig. 4). To achieve this, we undertook the following tasks:
software and conventional computation. We propose a non- (1) we exploited two different magnetization states of a nanomag-
Boolean computation targeting these stages, built around nanomag- netic disk as a state representation, where a circular nanomagnetic
nets. In particular, we concentrate on the perceptual organization disk in a critical dimension settles into a vortex state when weakly
problem. When a collection of nanomagnetic disks are driven to coupled, or settles into a single domain state when strongly
an excited state and relaxed, they tend to couple magnetically coupled; (2) we developed a magnetic Hamiltonian (equation (4));
with one another to minimize the total magnetic energy of the (3) we fabricated a magnetic system corresponding to the image
system. We harness this energy minimization to solve a quadratic in Fig. 1a to identify salient edge segments (Fig. 1e); and (4) we
objective function that arises in perceptual organization. The advan- experimentally demonstrated the mechanism of energy minimum
tage of this method is that it computes the solution in a direct computing for a particular problem. However, we envision that, in
manner, regardless of the problem size (image features), unlike in the future, and with a reconfigurable grid, we could select/deselect
a Boolean logic-based computation where the number of iterations the computing/non-computing nanomagnets to solve different
increases with the problem size. problems on one fabricated layout.
In recent years, magnetic nanostructures have been widely used
and researched for memory applications12–14, but only non-volatility Perceptual organization in computer vision
through remanence is harnessed. The magnetic interaction between Perceptual organization is a critical step in object recognition. An
neighbouring magnets has also been exploited for traditional example is presented in Fig. 1a,b,e,f, which shows the three stages
Boolean computing15–21. There has been other significant research involved in object recognition. The first stage involves the detection
in the field of non-Boolean computing using nanomagnets. For of the boundaries between the contrast differences in an image
example, Behin-Aein and colleagues have proposed all-spin logic (Fig. 1b). The second stage involves identifying salient edge seg-
in devices for processing and storing information22–24, and Csaba ments. The main objective of this Article is to identify the salient
and Porod have described non-Boolean computing logic using edge segments that belong to objects (salient group)26. For this
spin–torque oscillators25. task, each pair of edge segments is associated with a pairwise affinity
Imagine a magnetic coprocessor with a two-dimensional array of value capturing its saliency (Supplementary Fig. 7d).
nanomagnetic disks that are integrated with read/write/clock circui- In this Article we use an affinity function (defined in ref. 5) that
try (Fig. 1d). A ‘compiler’ for this coprocessor would (1) process an captures the pairwise affinity aij of the ith and jth edge segment in an

1
Department of Electrical Engineering, University of South Florida, Tampa, Florida 33620, USA. 2 Department of Computer Science & Engineering, University
of South Florida, Tampa, Florida 33620, USA. †These authors contributed equally to this work. * e-mail: sanjukta.bhanja@gmail.com

NATURE NANOTECHNOLOGY | VOL 11 | FEBRUARY 2016 | www.nature.com/naturenanotechnology 177

© 2016 Macmillan Publishers Limited. All rights reserved


ARTICLES NATURE NANOTECHNOLOGY DOI: 10.1038/NNANO.2015.245

a b e f shows the objects identified in Fig. 1a. The critical task in these
three stages is quadratic optimization, which is an unconstrained
optimization problem. A traditional Boolean logic and arithmetic-
based computer uses algorithms27–30 to solve the quadratic optimiz-
ation process. Typically, these methods involve multiple iterations
with many arithmetic operations, which are computationally inten-
sive for logic-based computing platforms. Sarkar and colleagues
Feature Perceptual Object
extraction organization matching
were the first to propose a novel computation method to solve
(edges) (quadratic the quadratic optimization problem in a direct manner using
optimization) quantum-dot cellular automata (QCA)31, which is the precursor
c of magnetic systems. This was later extended to magnetic cellular
CMOS automata32,33. Here, we have formulated and demonstrated a
coprocessor magnetic system to solve such complex perceptual organization
ALU
problems with a single input–output cycle, in contrast to the
orders of magnitude more clock cycles that would be required for
a Boolean logic-based computation27.
Edge Simulated annealing Salient
edge
image
segments
Design of the magnetic system
d From the experiments, we observed that the strongly coupled nano-
Magnetic magnetic disks tend to have a single domain state and the weakly
coprocessor coupled nanomagnetic disks tend to have a vortex state. When the
magnets are closer to each other, the coupling energy between
the nanomagnetic disks will keep the nanomagnetic disks in the
single domain state, and if they are far apart, the exchange energy,
Select Select anisotropy energy and demagnetization energy are dominant and
Select
Vdd the nanomagnetic disk will go to its vortex state. This mechanism
Magnetic-field-coupled computing allows us to obtain a qualitative value for the magnetic coupling
strengths among the nanomagnetic disks. We exploit this phenom-
enon to identify strongly coupled nanomagnetic disks, which
Figure 1 | Stages in object recognition. a, Grey-scale satellite image
correspond to salient edge segments.
of an urban area. b, Edge image with extracted edge segments from a.
To associate the magnetization state space with the saliency of an
c, Traditional approach to solving a quadratic optimization process using a
edge segment, we mapped the magnetization state to a variable S,
CMOS-based arithmetic and logic unit (ALU). d, Proposed method to solve
the magnitude of which is either 0 or 1, through a magnetization
the quadratic optimization process using a magnetic coprocessor. e, Salient
state abstraction model (Fig. 2). Here, a vortex state is represented
edge segments identified from b using the method in c. f, Objects identified
by ‘0’ and a single domain state by ‘1’. The magnitude of the mag-
using salient edges obtained from e.
netization state |S| is approximated numerically by a step function
based on the internal energies of the nanomagnetic disks obtained
edge image. This affinity function captures the saliency between from simulation experiments, governed by the Landau–Lifshitz–
edge segments that are parallel, perpendicular, connected and over- Gilbert (LLG) equation. A detailed derivation of the magnetization
lapped, as expressed in state abstraction model is provided in Supplementary Section 3.1
    and a schematic of the levels of abstraction is shown in Fig. 2b–d.
 −oij −dmin The magnetization state variable magnitude |S| is a step function
aij = li lj exp exp cos2 (2ψ ij ) (1) based on the base ten logarithmic of the |D| value and can be
max(li , lj ) max(li , lj )
expressed as

where li and lj are the lengths of the ith and jth edge segments, 0, log10 |Di | < η
respectively. ψij, oij and dmin are the angle, overlap and minimum |Si | = (3)
1, log10 |Di | ≥η
distance between the ith and jth edge segments, respectively.
Once all the pairwise affinity values are calculated, the next task where |Di| is the magnitude of the vector pointing from the
in the second stage is to find the minimum number of edge seg- vortex core to the disk centre of the ith nanomagnetic disk and
ments that maximize the total affinity value. These edge segments η = –5.1 (Fig. 2).
are grouped as salient through a quadratic optimization process, We next developed a magnetic Hamiltonian (equation (4)) in
and the objective function is expressed as quadratic form that matches the objective function of the vision
problem (equation (2)). A detailed derivation of the magnetic
N 
 N 
N
aij xi xj + λ xi + κ (2) Hamiltonian is provided in Supplementary Section 3. The magnetic
i=1 j=i+1 i=1 Hamiltonian developed here is expressed as follows:
N 
 N 
N
where aij is the pairwise affinity value between the ith and jth edge γ e−σrij Si · Sj + β |Si | + Nω (4)
segments. xi takes a value of either 0 or 1, where ‘1’ represents the i=1 j=i+1 i=1
salient edge segments. N is the total number of edge segments.
λ takes the value of –1. κ is the number of edge segments in where rij is the centre-to-centre distance between the ith and jth
the salient group. Figure 1e shows the group of salient edge nanomagnetic disks. Si and Sj are the state values for the ith and
segments for Fig. 1b, computed using the objective function in jth nanomagnetic disks, respectively. γ, σ, β and ω are parameters
equation (2). found using a numerical approximation method.
The final stage of object recognition involves the matching We exploited the correspondence between the magnetic
between a model database and the salient groups8–11. Figure 1f Hamiltonian and the objective function of the vision problem to

178 NATURE NANOTECHNOLOGY | VOL 11 | FEBRUARY 2016 | www.nature.com/naturenanotechnology

© 2016 Macmillan Publishers Limited. All rights reserved


NATURE NANOTECHNOLOGY DOI: 10.1038/NNANO.2015.245 ARTICLES
a |D| ≈ ∞
|S| = 1 z
|D| ≈ ∞
|S| = 1 x
y
|D| = 0
LLG legend
|S| = 0
|D| ≈ ∞
|S| = 1 d

|S| = 0 |S| = 1

|D| = 0 |D| ≈ ∞

Virtu b
al vor
tex
simula with LLG
tion r m
epres icromagne Vortex Single
entat tic
ion state domain
state

Figure 2 | Schematic of the magnetization state abstraction model. a, LLG magnetization spin configuration representation of the virtual vortex model. The
virtual vortex core lies at the centre of the plane. b, First level of abstraction. LLG simulations of the multiple magnetization states of a nanomagnetic disk.
c, Second level of abstraction. A virtual vortex model, where the magnetization is represented by vector D from the virtual vortex core to the disk centre. We
have developed a function that is dependent on vector D (Supplementary Section 2) to represent the magnetic spins in the first level of abstraction. d, Third
level of abstraction. A magnetization state model, where the magnetization state is represented by variable S. A vortex state is represented by ‘0’ and a
single domain state by a unit vector whose direction captures the direction of the single domain arrangement.

Table 1 | Performance evaluation metrics of the magnetic coprocessor.


Figure Average thickness of nanomagnetic disk (nm) Average diameter of nanomagnetic disk (nm) True-positive (%) False-positive (%)
4f 10.8 103.5 83 39
4g 11.3 112.2 67 34
4h 11.6 129 83 39
4i 11.7 137.4 86 35
4j 10.7 145 83 31
The minimum edge-to-edge spacing between nanomagnets was 20 nm.

spatially arrange the nanomagnetic disks. The correspondence the distance map matrix (Supplementary Fig. 7e). The centre-to-
between the objective function in equation (2) and the magnetic centre distances between all pairs of nanomagnetic disks are
Hamiltonian in equation (4) can be observed. The magnetization calculated such that rij = 1/( log (aij )). The computed rij values are
states (Si, Sj ) and centre-to-centre distance (rij ) in the magnetic reconstructed into the form of an adjacency matrix. To spatially
Hamiltonian correspond to the saliency (xi, xj ) and pairwise affinity arrange the nanomagnetic disks, a statistical information visualiza-
(aij ) in the objective function, respectively. tion method known as multidimensional scaling (MDS) is used34.
The centre-to-centre distance between the nanomagnetic disks is MDS uses the adjacency matrix as an input and provides a set of
based on the pairwise affinity matrix (Supplementary Fig. 7d) and two-dimensional coordinates to locate the disk centres of the

Step 1 Step 2 Step 3 Step 4

2 2
1 3 3
4 4
1 5 1 5
2 2
3 3
4 4
5

Figure 3 | Steps involved in determining salient edges using the nanomagnetic coprocessor. Step 1: edge detection, affinity matrix calculation,
multidimensional scaling and mapping of features (edge segments) to nanomagnets. Step 2: activating computing nanomagnets and deactivating non-
computing nanomagnets. Each computing nanomagnet represent a feature (edge segment). Step 3: magnetic computing and relaxation. Identification of the
computing magnet’s magnetization state. Red represents a single domain state and yellow represents vortex states. Step 4: determination of the salient
features (edge segments) by back-tracing the mapping of the single domain computing nanomagnets with features (edge segments).

NATURE NANOTECHNOLOGY | VOL 11 | FEBRUARY 2016 | www.nature.com/naturenanotechnology 179

© 2016 Macmillan Publishers Limited. All rights reserved


ARTICLES NATURE NANOTECHNOLOGY DOI: 10.1038/NNANO.2015.245

a f k p

Single domain

Vortex
1 µm 1 µm

b g l q

Single domain

Vortex
1 µm 1 µm

c h m r

Single domain

1 µm 1 µm Vortex

d i n s

Single domain

1 µm 1 µm Vortex

e j o t

1 µm 1 µm

Figure 4 | Perceptual organization results for an image, obtained using fabricated nanomagnetic systems. a–e, SEM images of nanomagnetic systems.
f–j, MFM images of the nanomagnetic systems. k–o, Magnetization state identification (yellow represents the vortex state and red the single domain state).
p–t, Salient edge segments identified by the fabricated magnetic system.

nanomagnetic disks. Supplementary Fig. 7 presents a schematic of fabricated on a silicon wafer using standard electron-beam lithogra-
the process involved to generate the two-dimensional layout of phy, electron-beam evaporation and a lift-off process. A detailed
the nanomagnetic disks. description of the fabrication procedure is provided in the Methods.
To validate experimentally the non-Boolean computing method Figure 3 depicts the steps involved in determining the salient edges
proposed in this Article, we fabricated magnetic systems based on using the nanomagnetic coprocessor. The compiler maps each edge
the layout in Supplementary Fig. 7f. This magnetic system was segment to a single nanomagnet. After computation, the final

180 NATURE NANOTECHNOLOGY | VOL 11 | FEBRUARY 2016 | www.nature.com/naturenanotechnology

© 2016 Macmillan Publishers Limited. All rights reserved


NATURE NANOTECHNOLOGY DOI: 10.1038/NNANO.2015.245 ARTICLES
a
FM2
NM
FM1
Deselected b
magnet Selected
Select magnet

spacing
50 nm 110 nm

50 nm
Select
Select BL spacing diameter
Select
Select
Write Vdd
circuitry

j Connecting
Select node
Select
Select NiFe
Select
Select MgO
i Vdd Co/Pd

Select
Select
Select
Select
Select
Vdd

Figure 5 | Hardware schematics of envisioned magnetic coprocessor. a, A uniform two-dimensional STT-MRAM-based reconfigurable array with
underlying circuitry. Only selected magnets (magnets in the single domain state) corresponding to an objective function participate in the computation after
clocking. The deselected magnets will stay in a precessional state (non-computing state). Examples of two instances i and j are shown. b, Schematic of
spin-torque-driven reconfigurable array of nanomagnets (STRAN), showing cell dimensions, material and spacing parameters.

magnetization states of all the computing nanomagnets are identified. and diameter within critical dimensions did not affect the final output.
As each computing nanomagnet represents an edge, the single domain This evidence supports the consistency of the magnetic system.
nanomagnets are back-traced to their corresponding edges and are Of the five magnetic systems, four were able to identify more
identified as salient edge segments. than 80% of the salient edge segments in the image. We believe
The magnetic systems were characterized with a combination of that the low true-positive percentage in the second magnetic
a scanning electron microscope (SEM) and an atomic force micro- system (Fig. 4g) is a consequence of the system reaching an
scope (AFM) to identify the defect-free magnetic systems (Fig. 4) energy minimum that is not close to its global energy minimum.
and to obtain the average diameter and average thickness of the This is due to unevenness on the surface of the nanomagnetic
nanomagnetic disks in each magnetic system (Table 1). Samples disks. If the nanomagnetic disks have rough surfaces or bumps,
were fabricated with thicknesses between 6 nm and 19 nm. For they tend to resist coupling with the neighbouring nanomagnetic
the magnetic system with nanomagnetic disks with a thickness of disks. The performance of the magnetic system can be improved
19 nm their magnetization was observed to be only in the vortex by using more circular nanomagnetic disks with smoother surfaces.
state, and for disks with a thickness of 6 nm it was found to be Supplementary Figure 9 shows the trends in running time for
only in the single domain state. These observations agree with the the magnetic computing and an IBM ILOG CPLEX optimizer
simulation results in ref. 35. (CPLEX)30 operating with sparse affinity matrices with nodes with
The magnetic systems were stimulated with an external magnetic an average of four neighbours and eight neighbours. CPLEX
field as a clocking mechanism (driving the nanomagnets to the hard could only converge to feasible solutions for 63 of 101 images
axis) then relaxed for computation to take place. The external mag- with four neighbour sparse affinity matrices, with an average
netic field was applied in an out-of-plane direction with a magni- running time of 184 s, and could converge to feasible solutions
tude of 0.08 T for a duration of 0.5 s. Subsequently, the magnetic for 59 of 101 images with eight neighbour sparse affinity matrices,
system was characterized with a magnetic force microscope with an average running time of 356 s. Thus, magnetic computing,
(MFM). The MFM images of the magnetic systems are presented on average, is 1,528 times faster than CPLEX with four neighbour
in Fig. 4f–j. The MFM images were analysed to identify the magne- sparse affinity matrices, and 468 times faster than CPLEX with
tization states of the magnetic systems. The single domain states are eight neighbour sparse affinity matrices. The true detection rate of
marked with red dots and the vortex states with yellow dots in the CPLEX with a four neighbour sparse affinity matrix for the grey-
magnetization state layouts shown in Fig. 4k–o. The single scale satellite image shown in Fig. 1a is 73%, which is 92 times
domain nanomagnets were then grouped and the corresponding slower than magnetic computing. For the same image, CPLEX
edge segments were selected as the most salient. Figure 4p–t with eight neighbour sparse affinity matrices had a poor perform-
shows the salient edge segments computed by the respective magnetic ance of 25%, while also being 175 times slower than magnetic com-
systems in Fig. 4f–j. puting. Another interesting trend apparent in Supplementary Fig. 9
Table 1 presents the performance evaluation metrics of the edge is that the CPLEX running time is quadratic with respect to the
images computed using the magnetic systems in Fig. 4 as well as the problem size, whereas the magnetic computing running time is
traditional logic-based computed edge images obtained according to linear with the problem size.
Fig. 1c. The true-positive percentages represent the number of
salient edge segments that are correctly identified. The false-positive Practical implementation and generality
percentages represent the non-salient edge segments identified as Our ongoing efforts are focused on designing an N×N program-
salient edges. It is evident from Table 1 that the variation in thickness mable grid, where we can select/deselect computing nanomagnets.

NATURE NANOTECHNOLOGY | VOL 11 | FEBRUARY 2016 | www.nature.com/naturenanotechnology 181

© 2016 Macmillan Publishers Limited. All rights reserved


ARTICLES NATURE NANOTECHNOLOGY DOI: 10.1038/NNANO.2015.245

This mechanism will allow us to map any desired layout, thereby References
maximizing the potential of the magnetic coprocessor to solve 1. Park, J., Zha, H. & Kasturi, R. in Computer Vision—ECCV 2004 (eds Pajdla, T. &
many instances of a problem. Matas, J.) 390–401 (Springer, 2004).
2. Maciel, J. & Costeira, J. A global solution to sparse correspondence problems.
Although there are a few techniques to manipulate the magneti- IEEE Trans. Pattern Anal. Mach. Intell. 25, 187–199 (2003).
zation of a nanomagnet, including strain-induced and spin-transfer 3. Ng, A. Y., Jordan, M. I. & Weiss, Y. On spectral clustering: Analysis and an
torque (STT)-induced, our proposed system can be generalized with algorithm. Adv. Neural Inf. Proc. Syst. 2, 849–856 (2002).
a two-dimensional STT magnetic random access memory (STT- 4. Shi, J. & Malik, J. Normalized cuts and image segmentation. IEEE Trans. Pattern
MRAM) reconfigurable array, as shown in Fig. 5a. Our reconfigur- Anal. Mach. Intell. 22, 888–905 (2000).
5. Sarkar, S. & Soundararajan, P. Supervised learning of large perceptual
able array includes STT-MRAMs that can be programmed (select/ organization: graph spectral partitioning and learning automata. IEEE Trans.
deselect) using STT currents. The array consists of magnetic tunnel- Pattern Anal. Mach. Intell. 22, 504–525 (2000).
ling junction (MTJ) cells with diameters of 110 nm, spaced 50 nm 6. Rota Bulò, S., Pelillo, M. & Bomze, I. M. Graph-based quadratic
apart, with an out-of-plane (tilted) polarizer as its reference layer optimization: A fast evolutionary approach. Comput. Vis. Image Und. 115,
(FM2) and an in-plane free layer (FM1) structure as shown in 984–995 (2011).
7. Levin, A., Rav Acha, A. & Lischinski, D. Spectral matting. IEEE Trans. Pattern
Fig. 5a. The STT strengths have the ability to induce persistent oscil- Anal. Mach. Intell. 30, 1699–1712 (2008).
lations in the free layer and can take these cells to a non-computing 8. Liu, Y., Zhang, D., Lu, G. & Ma, W.-Y. A survey of content-based image retrieval
state. This is the mechanism we envisage will deselect the cells from with high-level semantics. Pattern Recogn. 40, 262–282 (2007).
the array. This has been shown to be possible through LLG simu- 9. Iqbal, Q. & Aggarwal, J. K. Retrieval by classification of images containing
lations, and details are provided in Supplementary Section 5. large manmade objects using perceptual grouping. Pattern Recogn. 35,
1463–1479 (2002).
We envisage that, due to the fast growth in STT-MRAMs, a 10. Larsen, P., Rawlings, J. & Ferrier, N. Model-based object recognition to
CMOS interface of the readout is feasible, but this needs further measure crystal size and shape distributions from in situ video images.
exploration. Analytical values of the resistances of magnetic cells Chem. Eng. Sci. 62, 1430–1441 (2007).
with diameters of 110 nm, spaced 50 nm apart, with tilted magnetic 11. Michaelsen, E., Soergel, U. & Thoennessen, U. Perceptual grouping for
anisotropy (TMA) as its reference layer (FM2) and an in-plane free automatic detection of man-made structures in high-resolution SAR data.
Pattern Recog. Lett. 27, 218–225 (2006).
layer (FM1) structure (Co/Pd 40 Å/MgO 35 Å/NiFe 100 Å), as 12. Parkin, S. S. et al. Giant tunnelling magnetoresistance at room temperature
shown in Fig. 5b, were computed using LLG36. We found that the with MgO (100) tunnel barriers. Nature Mater. 3, 862–867 (2004).
vortex state resistance was 1,650 Ω and the minimum single 13. Åkerman, J. Toward a universal memory. Science 308, 508–510 (2005).
domain state resistance (the free layer in the +x direction and 14. Li, J., Ndai, P., Goel, A., Salahuddin, S. & Roy, K. Design paradigm for robust
the tilted reference layer at 45° between +x and +z) was 1,900 Ω. spin–torque transfer magnetic RAM (STT MRAM) from circuit/architecture
perspective. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 18, 1710–1723 (2010).
Note that for all other in-plane angles of the single domain 15. Cowburn, R. P. & Welland, M. E. Room temperature magnetic quantum cellular
state, the resistance will be higher than 1,900 Ω. Hence, a tunnel automata. Science 287, 1466–1468 (2000).
magnetoresistance (TMR) of 15% is the lowest we expect from 16. Imre, A. et al. Majority logic gate for magnetic quantum-dot cellular automata.
the MTJs. Science 311, 205–208 (2006).
Our group has been working on a differential non-destructive 17. Alam, M. et al. Experimental progress of and prospects for nanomagnet logic
(NML). In Silicon Nanoelectronics Workshop (SNW) 1–2 (2010).
scheme to design novel variability tolerant read37,38 for rectangular 18. Karunaratne, D. K. & Bhanja, S. Study of single layer and multilayer nano-
nanomagnets with a 45° tilted reference layer, yielding a TMR of magnetic logic architectures. J. Appl. Phys. 111, 07A928 (2012).
30%. We were able to achieve a reasonable sense margin by using 19. Yilmaz, Y. & Mazumder, P. Nonvolatile nanopipelining logic using multiferroic
a differential decision and sense circuit, which was adequate to dis- single-domain nanomagnets. IEEE Trans. Very Large Scale Integr. (VLSI) Syst.
tinguish logic ‘1’ and logic ‘0’. The circuit design was validated using 21, 1181–1188 (2013).
20. Vacca, M. et al. Magnetoelastic clock system for nanomagnet logic. IEEE Trans.
a 22 nm CMOS technology simulation37,38. In the current magnetic Nanotechnol. 13, 963–973 (2014).
coprocessor design, the minimum TMR between single domain and 21. Anderson, N. G. & Bhanja, S. in Field-Coupled Nanocomputing: Paradigms,
vortex states is ∼15%. Given the rapid progress of technology in this Progress, and Perspectives (eds Anderson, N. G. & Bhanja, S.) 8280
space, we are confident that this TMR, with an additional amplifier (Springer, 2014).
stage and other sensing innovation, will prove sufficient to yield an 22. Behin-Aein, B., Datta, D., Salahuddin, S. & Datta, S. Proposal for an all-spin
logic device with built-in memory. Nature Nanotech. 5, 266–270 (2010).
adequate sense margin. Other novel techniques can also be used, 23. Behin-Aein, B. Computing multi-magnet based devices and methods for
such as giant magnetoresistance sensors and optical sensors39–43. solution of optimization problems. US patent 8,698,517 (2014).
24. Behin-Aein, B., Sarkar, A. & Datta, S. Modeling circuits with spins and magnets
Conclusion for all-spin logic. In Proc. European Solid State Dev. Res. Conf. 36–40
(IEEE, 2012).
This Article is a proof of concept of the viability of this magnetic 25. Csaba, G. & Porod, W. Computational study of spin–torque oscillator
system for directly solving routinely occurring quadratic optimiz- interactions for non-Boolean computing applications. IEEE Trans. Magn.
ation problems by harnessing the energy-minimization nature of 49, 4447–4451 (2013).
nanomagnets. The magnetic coprocessor introduced here can be 26. Desolneux, A., Moisan, L. & Morel, J.-M. in Seeing, Thinking and Knowing
implemented as a two-dimensional array of circular MTJs with a (ed. Carsetti, A.) 71–101 (Springer, 2004).
27. Boykov, Y., Veksler, O. & Zabih, R. Fast approximate energy minimization
transistor for the disk selection and read mechanism. With recent via graph cuts. IEEE Trans. Pattern Anal. Mach. Intell. 23, 1222–1239 (2001).
advances in nanolithography technologies and the material sciences, 28. METSlib: an open source tabu search metaheuristic framework in modern C++;
the fabrication of such a dense two-dimensional array is now a https://projects.coin-or.org/metslib
reality. With this computing mechanism we have simplified a 29. Kügel, A. Diplom-Informatiker Universität Ulm; http://www.maxsat.udl.cat/12/
complex vision computation problem to a single input/output solvers/akmaxsat.pdf
30. IBM ILOG CPLEX Optimizer 12.6.1; http://www-01.ibm.com/software/
clock cycle, in contrast to conventional Boolean logic circuits, integration/optimization/cplex-optimizer/
which require orders-of-magnitude more clock cycles. 31. Sarkar, S. & Bhanja, S. Synthesizing energy minimizing quantum-dot cellular
automata circuits for vision computing. In 5th IEEE Conf. Nanotechnol. 541–544
Methods (IEEE, 2005).
32. Sarkar, S. & Bhanja, S. Direct quadratic minimization using magnetic field-based
Methods and any associated references are available in the online
computing. In IEEE Int. Workshop on Design and Test of Nano Dev. Circuits Sys.
version of the paper. 31–34 (IEEE, 2008).
33. Pulecio, J., Bhanja, S. & Sarkar, S. An experimental demonstration of the viability
Received 12 May 2014; accepted 16 September 2015; of energy minimizing computing using nano-magnets. In 11th IEEE Conf.
published online 26 October 2015 Nanotechnol. 1038–1042 (IEEE, 2011).

182 NATURE NANOTECHNOLOGY | VOL 11 | FEBRUARY 2016 | www.nature.com/naturenanotechnology

© 2016 Macmillan Publishers Limited. All rights reserved


NATURE NANOTECHNOLOGY DOI: 10.1038/NNANO.2015.245 ARTICLES
34. Cox, T. F. & Cox, M. A. Multidimensional Scaling (CRC Press, 2010). Acknowledgements
35. Kumari, A., Sarkar, S., Pulecio, J. F., Karunaratne, D. & Bhanja, S. Study of The authors thank the Nanotechnology Research and Education Center at the University of
magnetization state transition in closely spaced nanomagnet two-dimensional South Florida and the National Science Foundation. This work was supported in part by
array for computation. J. Appl. Phys. 109, 07E513 (2011). NSF CAREER grant no. 0639624, NSF (CRI) grant no. 0551621 and NSF (EMT) grant no.
36. Scheinfein, M. R. LLG micromagnetic simulator; http://llgmicro.home. 0829838. The authors thank J. Pulecio and A. Kumari for their earlier research.
mindspring.com/ (1997).
37. Das, J., Alam, S. M. & Bhanja, S. Non-destructive variability tolerant differential Author contributions
read for non-volatile logic. In IEEE 55th Int. Midwest Symp. Circuits Sys. S.B. and S.S. proposed and supervised the project. The entire team was involved in
178–181 (IEEE, 2012). modelling and analytical framework. R.P. implemented software code for the analytical and
38. Das, J., Alam, S. M. & Bhanja, S. Ultra-low power hybrid CMOS-magnetic mathematical modelling of the nanomagnetic system under S.S’s guidance. R.P. ran
logic architecture. IEEE Trans. Circuits Syst. I, Reg. Papers 59, experiments for speed comparisons and prepared responses for the reviews with S.B. and
2008–2016 (2012). S.S. supervision. D.K. fabricated the nanomagnetic systems and performed fabrication
39. Lambert, C.-H. et al. All-optical control of ferromagnetic thin films and experiments. D.K and S.B. analysed the results of fabrication experiments. S.R. performed
nanostructures. Science 345, 1337–1340 (2014). micromagnetic simulation experiments and programmability of the grid, with S.B’s
40. Liu, S. et al. Magnetic–electrical interface for nanomagnet logic. IEEE Trans. guidance. D.K. and R.P. prepared the initial draft of the paper. All authors discussed the
Nanotechnol. 10, 757–763 (2011). results and modified the manuscript.
41. Becherer, M., Kiermaier, J., Breitkreutz, S., Eichwald, I. & Schmitt-Landsiedel, D.
Nanomagnetic logic gate and an electronic device. US patent 8,872,547 (2014). Additional information
42. Dong, X. et al. Circuit and microarchitecture evaluation of 3D stacking magnetic Supplementary information is available in the online version of the paper. Reprints and
RAM (MRAM) as a universal memory replacement. In 45th ACM/IEEE Design permissions information is available online at www.nature.com/reprints. Correspondence and
Automation Conf. 554–559 (IEEE, 2008). requests for materials should be addressed to S.B.
43. Hosaka, S. et al. Nano magnetic column arrays fabrication for patterned media
in magnetic recording using EB lithography and ion milling. In IEEE Int. Conf. Competing financial interests
Electron Dev. Solid-State Circuits 140–143 (IEEE, 2009). The authors declare no competing financial interests.

NATURE NANOTECHNOLOGY | VOL 11 | FEBRUARY 2016 | www.nature.com/naturenanotechnology 183

© 2016 Macmillan Publishers Limited. All rights reserved


ARTICLES NATURE NANOTECHNOLOGY DOI: 10.1038/NNANO.2015.245

Methods agitating the sample in a heated ultrasonic bath. Once the liftoff was completed, the
Fabrication process. The fabrication of the nanomagnetic devices was carried out in sample was rinsed with IPA and dried in nitrogen gas. A detail description of the
six steps (Supplementary Fig. 10). For the substrate we used an n-type, 〈100〉, 2-inch liftoff procedure is provided in Supplementary Table 6.
silicon wafer. In the first step—cleaning the silicon wafer—a standard Radio
Corporation of America (RCA) cleaning procedure was used. The second step was to Characterization process. The characterization of the nanomagnetic devices
spin-coat the silicon wafer with poly(methyl methacrylate) (PMMA), and the third was included topological and magnetic inspection. Topological inspection allowed
to expose the silicon wafer with an electron beam to produce the desired patterns. The defect-free nanomagnetic devices to be identified. The defect-free nanomagnetic
fourth step was to develop the sample in a developer solution, and in the fifth, a uniform devices were then subjected to magnetic inspection to analyse the magnetization
layer of Permalloy was evaporated onto the wafer. The final step was a liftoff procedure states. Based on the magnetization states of the nanomagnetic devices, we could
to remove unwanted material from the silicon wafer, leaving behind the nanomagnetic understand and implement computations.
logic devices. These fabrication steps are described in detail in the following sections. For the topological inspection we used an SEM and scanning probe microscope in
AFM mode. A Hitachi SU-70 SEM operating at 30 kV was used to obtain the lateral
Step 1: substrate cleaning. The RCA cleaning procedure removed all organic, ionic, measurements of the nanomagnetic devices. The working distance of the SEM was
oxide and heavy metal contaminants from the surface of the silicon wafer. The reduced to 5 mm to obtain high-resolution SEM images of the devices. To obtain
procedure used three cleaning solutions, the first containing ammonium hydroxide, thickness measurements of the devices, a VEECO DI300 scanning probe microscope was
hydrogen peroxide and water. This removed all organic contamination from the used in AFM mode. A silicon probe with an angle of 22° and an apex with a radius of
surface. The second solution, a hydrogen fluoride solution, removed metallic 20 nm was mounted onto the scanning probe microscope. The AFM information was also
contamination and oxides from the surface. The final solution was a mixture of used to measure the roughness of the surface of the nanomagnetic devices. Combining the
hydrochloric acid, hydrogen peroxide and water, and removed all ionic and heavy SEM and AFM measurements we were able to identify defect-free nanomagnetic devices.
metal contamination from the surface. Between each solution, the wafer was dipped For magnetic inspection of the nanomagnetic devices we used a VEECO DI300
in deionized water to remove any residue from the previous solution. The wafer was scanning probe microscope in MFM mode. A magnetic probe was mounted on the
then dried with nitrogen gas. A detail description of each step in the RCA cleaning scanning probe microscope. There are three types of magnetic probe: standard moment
procedure is provided in Supplementary Table 2. (coercivity, 400 Oe; moment, 1e –13 electromagnetic units (EMU)), low moment
(coercivity, <400 Oe, moment, 0.3e –13 EMU) and low coercivity (coercivity, <10 Oe,
Step 2: resist coating. Spin-coating the silicon wafer with PMMA is a key step in the moment, <1e –13 EMU). A low-moment magnetic probe was used to obtain high-
fabrication procedure. The thickness of the PMMA is crucial for the liftoff resolution magnetic images of the nanomagnetic logic devices. These MFM images
procedure. As a rule of thumb, the thickness of the PMMA should be more than were then used to identify defect-free magnetic systems and realize the computation.
three times the thickness of the nanomagnetic device. Because the nanomagnetic
devices have a thickness of ∼10 nm, we anticipated the thickness of the PMMA to be Speed comparison with state-of-the-art techniques. Each nanomagnet has effective
∼30–40 nm. To achieve this thickness the PMMA was dissolved in anisole before direct coupling with its four nearest neighbours, and at most with its four second-
spin-coating onto the wafer, which was achieved using a Laurell Technologies WS- nearest neighbours in a grid. So, to be fair, we need to make a comparison with
400A-8NPP/Lite spin processor. Once the resist was spun onto the silicon wafer, the algorithms that leverage from sparsity and could be parallelized. There are many
wafer was soft-baked in a furnace at 170 °C for 30 min to remove any excess solvent. quadratic optimization techniques, including METSLib tabu search28, branch and
The procedure used for PMMA coating is described in Supplementary Table 3. bound solver29, and quadratic programming30 by IBM ILOG CPLEX Optimizer
(CPLEX). In a recent study, McGeoch44 and colleagues concluded that, for quadratic
Step 3: electron beam lithography. To pattern the nanomagnetic devices on the problems, IBM’s quadratic programming algorithms and heuristics outperform other
resist we used a high-resolution electron-beam lithography technique using a alternative conventional solvers. It is also a widely used optimization commercial
Hitachi SU-70 SEM retrofitted with a nanometre pattern generation system (NPGS) software tool. We thus compared our work with the results from CPLEX.
by JC Nabity Lithography Systems. The pattern of the nanomagnetic devices was CPLEX Optimizer30,45 can solve large sparse matrices with a nonconvex quadratic
designed on DesignCAD 2000 NT and was saved in a format recognized by the objective function and with constraints on all variables being binary 0–1. The CPLEX
NPGS. Once the patterns were designed and the silicon wafer coated with PMMA search algorithm exploits parallelism in solving nodes of the branch-and-cut tree, but
was mounted in the SEM chamber, the SEM was operated at 30 kV and the electron produces repeatable, invariant solution paths. By default, CPLEX applies as much
beam was aligned, stigmatized and focused to achieve optimum resolution. The parallelism as possible while still achieving deterministic results. As our perceptual
optimum condition for writing the pattern was determined by growing organization vision problem has a nonconvex quadratic objective function with
contamination spots on the PMMA resist before patterning the nanomagnetic logic constraint on all variables to be binary, we used the CPLEX function ‘cplexmiqp’,
devices. Such a spot with a diameter of 20 nm is shown in the SEM image in which solves mixed integer quadratic programming problems. Additionally, the solver
Supplementary Fig. 11. Once the optimum conditions were reached the patterns uses the barrier (or interior point) algorithm to leverage large sparse problems.
were written on the resist. A detailed description of the electron-beam lithography
procedure is provided in Supplementary Table 4. Experimental parameters. We ran experiments on sparse affinity matrices with 96%
sparsity such that each node had an average of eight neighbours and with 98% sparsity
Step 4: development procedure. The objective in this procedure was to remove the such that each node has an average of four neighbours. Our experimental data set
exposed PMMA from the silicon wafer. The difference between exposed and consisted of 101 images with edge segments varying between 75 and 1,190. We used
unexposed PMMA is the molecular weight of the polymer, with the exposed PMMA a PC with Intel(R) Xeon(R) E5-2670/E5-2630 (8/6 Core) @ 2.60 GHz/2.30 GHZ
having a lower molecular weight and the unexposed PMMA a higher molecular with 32 GB RAM running Linux OS for all experiments. The running time was the
weight. The developer solvent dissolves the polymer with the lower molecular time taken to sparsify the affinity matrix and the time taken for the quadratic
weight. We used a mixture of methyl isobutyl ketone and isopropanol (IPA) in a 3:1 optimization algorithm.
ratio as the developer solution. The sample was dipped in the developer solution for
60 s to dissolve the exposed PMMA. We optimized the concentration and duration Timing prediction of the proposed magnetic solver. The spatial arrangement of the
of the procedure to attain perfect exclusion of the exposed PMMA, leaving the nanomagnets was calculated using optimized MDS, as explained in Supplementary
unexposed PMMA on the silicon wafer. The sample was taken out of the developer Section 4. The average time taken for layout generation on 101 images was 0.04 s. We
solution, rinsed with IPA, and dried with nitrogen gas. The complete procedure for borrowed from the existing literature46–48 to predict a magnetic initialization (writing) of
this step is provided in Supplementary Table 5. 10 ns, a relaxation time of ≪1 ns and read times of 1 ns. Hence, the estimated running
time for the magnetic coprocessor will be on the order of ∼0.04 s. Of course, technology
Step 5: ferromagnetic film deposition. Permalloy was used as the soft constraints can change these estimates, but we see great potential for magnetic accelerators.
ferromagnetic material to fabricate the nanomagnetic devices. A uniform thin film
of Permalloy was deposited on the sample using a Varian Model 980-2,462 electron References
beam evaporator. The sample was mounted in the chamber of the evaporator and 44. McGeoch, C. C. & Wang, C. Experimental evaluation of an adiabiatic quantum
the chamber was then pumped down to a pressure of 2 μtorr. The electron filament system for combinatorial optimization. In Proc. ACM Int. Conf. Computing
was switched on, and the Permalloy source was heated by the energy provided from Frontiers. 23 (ACM, 2013).
the electron beam until it vaporized. Once the Permalloy source was uniformly 45. IBM ILOG CPLEX Optimizer performance benchmarks; http://www-01.ibm.
heated, the shutter was opened and deposition took place. A 10 nm layer of com/software/commerce/optimization/cplex-performance/
Permalloy was evaporated onto the sample at a rate of 1–2 Å s–1. 46. Jog, A. et al. in Cache revive: architecting volatile STT-RAM caches for enhanced
performance in CMPs. In Proc. 49th Annu. Design Automation Conf. 243–252
Step 6: liftoff. In this final step, unwanted material (unexposed PMMA and the (ACM, 2012).
Permalloy on the unexposed PMMA) was selectively removed, while preserving the 47. Kultursay, E., Kandemir, M., Sivasubramaniam, A. & Mutlu, O. Evaluating STT-
Permalloy deposited on the silicon surface. Acetone was used to selectively remove RAM as an energy-efficient main memory alternative. In 2013 IEEE Int. Symp.
the organic resist and to preserve the ferromagnetic material on the sample. The Performance Analysis Sys. Software 256–267 (IEEE, 2013).
unexposed PMMA dissolves easily in acetone, and the Permalloy on top is removed 48. Wang, K., Alzate, J. & Amiri, P. K. Low-power non-volatile spintronic memory:
simultaneously. The sample was dipped in an acetone bath for 10 min while STT-RAM and beyond. J. Phys. D: Appl. Phys. 46, 074003 (2013).

NATURE NANOTECHNOLOGY | www.nature.com/naturenanotechnology

© 2016 Macmillan Publishers Limited. All rights reserved

You might also like