Professional Documents
Culture Documents
1 s2.0 S1877050917307858 Main
1 s2.0 S1877050917307858 Main
com
ScienceDirect
This space is reserved for the Procedia header, do not use it
This space is Computer
Procedia reservedScience
for the Procedia
108C header, do not use it
(2017) 755–764
way that will satisfy all of the equations. Since the inception of the least squares method by
Gauss [3], it has been applied in countless scientific, engineering, and numerical contexts. When
investigating physical systems, it is not uncommon for non-negativity constraints to be applied
in order for the system to be physically meaningful. Such constraints have been necessary
in the computational Earth sciences when using nuclear magnetic resonance to determine pore
structures [7], formulating petrologic mixing models [1, 21], and addressing seismologic inversion
problems [28]. This constrained least squares problem is the non-negative least-square (NNLS)
problem, that can be stated as minx ||Ax−b||2 , such that x ≥ 0, where A is the m×n system of
equations, b is the solution vector of measured data, and x is the vector of obtained parameters
obtained that minimizes the 2-norm of the residual.
Recent problems in rock magnetism have required solving NNLS problems composed of tens
of thousands of variables [8, 26, 25]. Without the restrictions of conventional computer systems
and algorithms, these problems could expand to hundreds of thousands of variables. Direct
spatial solutions to these problems using standard NNLS algorithms have required months of
execution time. Frequency domain methods [15] have been developed as an alternative fast
method to obtain representative solutions, but can produce non-physical artifacts (e.g., Gibbs
phenomenon at sharp interfaces) that violate the non-negativity constraint.
We present TNT-NN, a new (dynamite) active set strategy for solving large NNLS problems.
TNT-NN improves upon prior efforts by incorporating a more aggressive strategy for identifying
the active set of constraints and by using an improved solver to address the unconstrained least
squares sub-problem. We show that TNT-NN dramatically outperforms the present Fast NNLS
active set algorithm on a wide variety of test problems and that the maximum tractable problem
size is extended to the point where previously prohibitive problems are now feasible.
2
J.M.NNLS
TNT-NN: A Fast Method for Large Myre etProblems
al. / Procedia Computer Science 108C (2017) 755–764
Myre, Frahm, Lilja, and Saar 757
Common strategies for solving the unconstrained least squares problem, minx ||Ax − b||2 ,
include the normal equations [9, 11], the pseudoinverse [3], and methods using QR decompo-
sition [23]. Using the normal equations introduces excessive numerical round off error. The
pseudoinverse provides a conceptually simple and numerically accurate approach, but can be
computationally expensive. QR methods are often a good compromise between the normal
equations and pseudoinverse methods, providing acceptable accuracy and numerical efficiency.
3
758 J.M.NNLS
TNT-NN: A Fast Method for Large Myre etProblems
al. / Procedia Computer Science 108C (2017) 755–764
Myre, Frahm, Lilja, and Saar
is always initialized at the origin. Other active set methods have no such constraints inherent
to their design. This is a justifiable design decision due to the physics of real world problems
where solutions corresponding with low energy states are often desired.
The TNT-NN Matlab implementation used in this paper is archived in a GitHub reposi-
tory [18].
4 Test methodology
TNT-NN is tested for performance and numerical accuracy using a Matlab implementation.
All tests include FNNLS and TNT-NN. The reference FNNLS Matlab function [5] is used
throughout these tests. These tests are performed using Matlab r2016a on a host that is
composed of dual Intel Xeon 6-core 2.93 GHz X5670 CPUs with 384 GB of memory.
The first test spans systems with dimensions of 200×200 to 5000×5000, in increments of 200
for both rows and columns. A complete combination of dimensions, condition numbers (102 , 105 ,
and 109 ) and percentage of active constraints (5%, 20%, and 50%) are tested. The reduction and
expansion constants used in the TNT-NN algorithm for these tests are 0.5 and 2, respectively.
Timing performance is reported as speedup of TNT-NN vs. FNNLS. Numerical accuracy is
reported as solution error, calculated as the 2-norm of the solution residual (||b − Ax||2 ).
The NNLS solver included with Matlab (lsqnonneg, which uses LH-NNLS) is not included
in these tests due to prohibitively slow performance. For example, to solve 2000x2000 and
3600x3600 systems lsqnonneg required 2.5 and 25 hours, respectively, while TNT-NN only
required 2.5 and 14 minutes, respectively. The exclusion of the LH-NNLS due to prohibitive
performance characteristics is consistent with other studies of this nature [13].
The convergence of the TNT-NN algorithm is analyzed by varying the constants that govern
the number of variables that are added to and removed from the active set using two tests: one
that varies the reduction constant (0.1 to 0.9 in increments of 0.1) and one that varies the
expansion constant (1.1 to 1.9 in increments of 0.1). Default values for rc and ec are 0.5 and
1.5, respectively. These tests solve 1500×1500 systems with 50% active constraints and κ = 109 .
Using the convergence test results, we test the performance of TNT-NN for large n × n
matrices where n is 10000 through 25000 in increments of 5000 using a high condition number
(κ = 109 ) as well as 5% and 50% active constraints. Each of these tests is repeated five times
and the arithmetic mean of the five execution times is used to report speed up over FNNLS.
The final test compares the performance of FNNLS and TNT-NN when solving an extremely
large (by current standards) 45000 × 45000 system. This test performs a complete combination
of condition numbers (κ = 102 and 109 ) and active non-negativity constraints (5% and 50%).
5 Discussion of results
5.1 Execution Time
Speedup results for all tests are shown in Figure 1. These results show that TNT-NN out-
performs FNNLS for almost all of the tested matrices (speedup increasing with system size).
TNT-NN does not outperform FNNLS (speedup ≤ 1) under the following conditions: 1) the
system of equations is small (at least one dimension is on the order of 102 ) and 2) the system of
equations is very ill-conditioned and underdetermined. When the system of equations is small
the time spent by TNT-NN to intelligently construct the active set is detrimental to the point of
mitigating the benefits provided by the least squares solver. These conditions limit the perfor-
mance of TNT-NN to roughly the same level of performance as FNNLS. A system of equations
4
J.M.NNLS
TNT-NN: A Fast Method for Large Myre etProblems
al. / Procedia Computer Science 108C (2017) 755–764
Myre, Frahm, Lilja, and Saar 759
that is underdetermined can have an infinite number of valid solutions and ill-conditioning
imparts additional difficulty. TNT-NN then spends more time evaluating the solution space
to find a valid solution because the gradient does not always provide an accurate prediction.
This additional time spent evaluating the underdetermined solution space reduces the relative
performance of TNT-NN compared to FNNLS.
(a) (b)
(c)
‘
The tests with high values of κ do not exhibit speedup growth curves that are as steep as
the curves for tests using smaller values of κ. High condition numbers make it difficult for
TNT-NN to accurately predict what variables should be constrained. This property leads to
damped performance, particularly in the over- and well-determined regions, where TNT-NN
would receive the most benefit from prediction.
Note that the relative performance of TNT-NN increases with the size of the system it is
solving. While the difference in performance for current problems might be considered small for
some of these tests (speedups of less than five in some instances), the large problems that will
need to be addressed in the near future will benefit from the enhanced performance of TNT.
5.2 Error
Histograms of log10 (abs(||b − Ax||2 )) for all tests are shown in Figure 2. Two features are
notable: 1) there are minimal differences between the FNNLS and TNT-NN error distributions
5
760 J.M.NNLS
TNT-NN: A Fast Method for Large Myre etProblems
al. / Procedia Computer Science 108C (2017) 755–764
Myre, Frahm, Lilja, and Saar
for over and well-determined tests and 2) the error in the underdetermined region is typically
lower than the error in the over and well-determined regions for FNNLS tests.
(A) (B)
Figure 2: Histograms of solution error (as log10 (|error|)) for (A) over- and well-determined
tests and (B) underdetermined tests. Error from TNT-NN is never more than FNNLS for over-
and well-determined tests and it is lower on average for underdetermined tests.
Feature 1 is not unexpected as over and well-determined systems have unique solutions.
Differences in TNT-NN and FNNLS solutions are attributable to machine round off error.
Feature 2 is likewise not unexpected as TNT-NN produces solutions with small norms by default.
When an infinite number of solutions exist, as in underdetermined systems, it is customary to
choose a solution that has a small norm since those solutions can also be described as solutions
with low energy states. This design decision allows TNT-NN to have a lower error distribution
than FNNLS in all but one percent of the underdetermined tests.
5.3 Convergence
(A) (B)
Figure 3: Total unconstrained TNT-NN iterations to convergence due to variations in (A) the
reduction constant (ec fixed at 1.5) and (B) the expansion constant (rc fixed at 0.5).
The reduction constant has the greatest effect on total inner loop iterations to convergence,
spanning more than 80 iterations (Figure 3A). The expansion constant does not have as great
of an effect, exhibiting a range of just over 5 total inner loop iterations to convergence (Figure
3B). Although these are simple tests and analyses, they indicate that an acceptable value for
the reduction constant lies in the range of [0.1, 0.5] and that the expansion constant does not
have much effect in the range [1.1, 1.9]. To reduce the total number of inner loop iterations to
convergence, reasonable initial values for the reduction and expansion constants would be 0.2
and 1.2, respectively.
6
J.M.NNLS
TNT-NN: A Fast Method for Large Myre etProblems
al. / Procedia Computer Science 108C (2017) 755–764
Myre, Frahm, Lilja, and Saar 761
(A) (B)
Figure 4: Performance trends of speedup beyond the original 5000 × 5000 tests (dashed lines)
are shown for square systems from 10000 × 10000 to 25000 × 25000 in increments of 5000 (solid
line with circle markers) where κ = 109 using (A) 5% and (B) 50% active constraints.
The speedup provided by using TNT-NN over FNNLS for n × n matrices beyond the initial
5000 × 5000 space, where κ = 109 and active constraints (non-negativity) of 5% and 50%, are
shown in Figure 4. These results show consistent positive trends in performance with the size
of the square matrices. It is important to note that this trend comes from sparsely sampled
data. A densely sampled (but more time intensive) test suite may produce results that reveal
more details of this behavioral trend.
When solving an extremely large (45000 × 45000) systems, TNT-NN outperforms FNNLS
by at least 40× (Figure 5). When FNNLS was used to solve these large systems, the mean
execution time was 6.63 days. The mean TNT-NN execution time for the same tests was 2.45
hours (a speedup of 65×).
(A) (B)
Figure 5: Execution times (A) and speedups (B) are shown for large (45000 × 45000) systems
where κ = 102 and κ = 109 . Negativity constraints of 5% and 50% are applied for both
condition numbers. For these large tests the TNT-NN algorithm always outperforms the FNNLS
algorithm, by more than 50× in three cases with a maximum of 157×.
When looking beyond the mean performance of these algorithms, it is seen that there is a
reduction in the relative performance gain of TNT-NN when the condition number increases.
Again, high condition numbers make it difficult for the TNT-NN algorithm to accurately make
predictions about the active set. It is important to state that appropriate reduction and ex-
pansion constants are problem specific. Better performance might be obtained by tuning the
reduction and expansion constants.
7
762 J.M.NNLS
TNT-NN: A Fast Method for Large Myre etProblems
al. / Procedia Computer Science 108C (2017) 755–764
Myre, Frahm, Lilja, and Saar
Overall, these results represent a drastic shift in NNLS solver performance; from FNNLS
which requires more than a work-week to solve such large systems, to TNT-NN where multi-
ple “same workday” results are possible. Stated slightly differently, for large systems of this
scale, TNT-NN could potentially accomplish in a single day what might require more than a
month to accomplish with FNNLS. These trends hold promise for the types of large science and
engineering problems that may be addressed using TNT-NN.
8
J.M.NNLS
TNT-NN: A Fast Method for Large Myre etProblems
al. / Procedia Computer Science 108C (2017) 755–764
Myre, Frahm, Lilja, and Saar 763
tests. While these tests are limited to 5000 × 5000 systems, TNT-NN should be able to provide
additional performance as problem sizes become much larger in the near future. The error of
TNT-NN solutions are less than or equal to the FNNLS errors in all but 1% of the tests. We ap-
ply the TNT-NN algorithm to the rock magnetism inversion problem [25] where it outperforms
FNNLS with a speedup of 70× and produces a smaller error for the small low-resolution sample
tested. We expect that this difference in performance will be more pronounced for studies using
high-resolution data sets of natural samples. Such studies would benefit from this enhanced
performance accordingly.
We plan to continue developing TNT-NN to incorporate regularization method and tech-
niques for solving sparse systems. Additional work will examine methods for enhancing the
current active set strategy by providing it with higher quality “guesses” for the active set, par-
ticularly at high condition numbers. A more in depth analysis of the interplay between the
reduction and expansion constants governing the active set is also warranted. At present, these
heuristic constants should be considered problem specific tuning parameters. Alternative av-
enues for enhancing performance will also be investigated. Native implementations of the linear
algebra routines used in TNT-NN should be analyzed for potential performance enhancement.
We also plan to investigate the performance benefit of implementing TNT-NN using parallel
and accelerator hardware, such as GPU or Intel Xeon Phi technology.
Finally, we plan to apply TNT-NN to real world problems requiring, until now, prohibitively
large systems, like the rock magnetism inversion problem [25], are of particular interest.
8 Acknowledgments
This work was supported in part by National Science Foundation (NSF) Grants EAR-0941666
and CCF-1438286. Any opinions, findings, and conclusions or recommendations expressed in
this material are those of the authors and do not necessarily reflect the views of the NSF. We
also thank Bill Tuohy, Joshua Feinberg, Ioan Lascu, Ben Weiss, and Eduardo Lima for helpful
discussions and the Institute for Rock Magnetism, and the Minnesota Supercomputing Institute
for computational resources.
References
[1] F. Albarede and A. Provost. Petrological and geochemical mass-balance equations: an algorithm
for least-square fitting and general error analysis. Compute. Geosci., 3(2):309 – 326, 1977.
[2] Richard Aster, Brian Borchers, and Clifford Thurber. Parameter Estimation and Inverse Problems
(International Geophysics). Academic Press, 1st edition, January 2005.
[3] Åke Björck. Numerical Methods for Least Squares Problems. Siam Philadelphia, 1996.
[4] S. Boyd and L. Vandenberghe. Convex optimization. Cambridge university press, 2004.
[5] R. Bro and L. Shure. Reference fnnls matlab implementation, 1996.
http://www.ub.edu/mcr/als/fnnls.m.
[6] Rasmus Bro and Sijmen De Jong. A fast non-negativity-constrained least squares algorithm.
Journal of Chemometrics, 11(5):393–401, 1997.
[7] David P Gallegos and Douglas M Smith. A nmr technique for the analysis of pore structure:
Determination of continuous pore size distributions. Journal of Colloid and Interface Science,
122(1):143 – 153, 1988.
[8] J. Gattacceca, M. Boustie, E. Lima, B.P. Weiss, T. De Resseguier, and J.P. Cuq-Lelandais. Un-
raveling the simultaneous shock magnetization and demagnetization of rocks. Physics of the Earth
and Planetary Interiors, 182(1):42–49, 2010.
9
764 J.M.NNLS
TNT-NN: A Fast Method for Large Myre etProblems
al. / Procedia Computer Science 108C (2017) 755–764
Myre, Frahm, Lilja, and Saar
[9] Gene H. Golub and Charles F. Van Loan. Matrix Computations (Johns Hopkins Studies in Math-
ematical Sciences)(3rd Edition). The Johns Hopkins University Press, 3rd edition, October 1996.
[10] Karen H. Haskell and Richard J. Hanson. An algorithm for linear least squares problems
with equality and nonnegativity constraints. Mathematical Programming, 21:98–118, 1981.
10.1007/BF01584232.
[11] M.T. Heath. Numerical methods for large sparse linear least squares problems. SIAM J. Sci. Stat.
Comput., 5:497–513, 1984.
[12] E. Jones, T. Oliphant, P. Peterson, et al. SciPy: Open source scientific tools for Python, 2001–.
[13] Dongmin Kim, Suvrit Sra, and Inderjit S Dhillon. Tackling box-constrained optimization via a
new projected quasi-newton approach. SIAM J. Sci. Comput, 32(6):3548–3563, 2010.
[14] C. L. Lawson and R. J. Hanson. Solving least squares problems. 2nd edition, 1995.
[15] E.A. Lima, B.P. Weiss, L. Baratchart, D.P. Hardin, and E.B. Saff. Fast inversion of magnetic
field maps of unidirectional planar geological magnetization. J. of Geophys. Res. B: Solid Earth,
118(6):2723–2752, 2013.
[16] Yuancheng Luo and Ramani Duraiswami. Efficient parallel nonnegative least squares on multicore
architectures. SIAM J. Sci. Comput., 33(5):2848–2863, October 2011.
[17] Katherine M. Mullen and Ivo H.M. van Stokkum. The lawson-hanson algorithm for non-negative
least squares (nnls). Technical report, 2012.
[18] J.M. Myre, E. Frahm, D.J. Lilja, and M.O. Saar. TNT-NN reference implementation: http:
//dx.doi.org/10.5281/zenodo.438158, 2017.
[19] J.M. Myre, E. Frahm, D.J. Lilja, and M.O. Saar. TNT: A preconditioned method for applying
conjugate gradient to dense least-squares problems. In Prep.
[20] R. L. Parker. Geophysical Inverse Theory. Princeton University Press, 1994.
[21] M.J. Reid, A.J. Gancarz, and A.L. Albee. Constrained least-squares analysis of petrologic problems
with an application to lunar sample 12040. Earth and Planet. Sci. Lett., 17(2):433 – 445, 1973.
[22] Yousef Saad. Iterative methods for sparse linear systems. SIAM, 2 edition, 2003.
[23] Lloyd N Trefethen and David Bau III. Numerical linear algebra, volume 50. Siam, 1997.
[24] Mark H. Van Benthem and Michael R. Keenan. Fast algorithm for the solution of large-scale non-
negativity-constrained least squares problems. Journal of Chemometrics, 18(10):441–450, 2004.
[25] B. P. Weiss, E. A. Lima, L. E. Fong, and F. J. Baudenbacher. Paleomagnetic analysis using SQUID
microscopy. J. Geophys. Res., 112:B09105, 2007.
[26] B.P. Weiss, E.A. Lima, L.E. Fong, and F.J. Baudenbacher. Paleointensity of the earth’s magnetic
field using squid microscopy. Earth Planet. Sci. Lett., 264(1):61–71, 2007.
[27] Zonghou Xiong and Andreas Kirsch. Three-dimensional earth conductivity inversion. Journal of
Computational and Applied Mathematics, 42(1):109 – 121, 1992.
[28] Y. Yagi. Source rupture process of the 2003 Tokachi-oki earthquake determined by joint inversion
of teleseismic body wave and strong ground motion data. Earth, Planets Space, 56:311–316, March
2004.
10