132 Handb ook of Big Data
8.4 Numerical Performance
AsimpleMATLABcode,irlbar, that implements IRLBA is given in the Appendix. For
a full MATLAB implementation of IRLBA, see [2]. The speed of IRLBA is significant and
can be seen by the following straightforward example. Consider, the 1, 977, 885 × 109, 900
Rucci1 matrix in the Florida Sparse Matrix Collection [8]. The matrix group Rucci contains
matrices for least-squares problems and can be downloaded using the University of Florida’s
Sparse Matrix Collection program UFget. The MATLAB code irlbar was more than three
timesasfastasMATLAB’sinternalsvds for computing the six largest singular triplets of
the Rucci1 matrix within the same tolerance, 10
−5
. Speed was measured with MATLAB’s
tic and toc in version 8.3.0.532 (R2014a) on an iMac with 32 GB memory and 3.5 Ghz
Intel processor.
Anecdotal evidence of performance of the IRLBA in the statistical programming
language R was done by Bryan Lewis [23]. Lewis used IRLBA to compute the five largest
singular triplets on the Netflix training dataset (480, 189 ×17, 770 matrix) in a few minutes
on a laptop; see http://illposed.net/irlba.html for details.
The software implementations of IRLBA have existed for some time now in both
MATLAB [2] and R [23]. Recently Kane and Lewis [17] created the irlbpy package
for Python, a pip-installable open-source implementation of the IRLBA that is available
from Github at https://github.com/bwlewis/irlbpy. The irlbpy package is compatible
with dense and sparse data, accepting either numpy 2D arrays or matrices, or scipy
sparse matrices as input. The performance of irlbpy, the IRLBA augmented with Ritz
vector, is demonstrated in the graphs in Figure 8.1. The benchmarks in Figure 8.1 were
performedonaMacBookProwithaquad-core2.7GHzIntelCorei7with16GBof
1600 MHz DDR3 RAM running Python version 2.7.3, Numpy version 1.7.0, and SciPy
version 0.12.0. All matrices were square and randomly generated. Elements in the graphs
in Figure 8.1 represent the order of the matrices. These graphs show that in practice, when
searching for the largest singular triplets, IRLBA scales linearly with the size of the data,
which gives it a tremendous advantage over the traditional SVD methods. The IRLBA is
particularly well suited to problems when m and n are not too different, which are often
the most computationally challenging ones. All examples in Figure 8.1 can be found at
https://github.com/bwlewis/irlbpy.
8.5 Conclusion
This chapter provides an overview of the IRLBA method augmented with Ritz vectors
developed in 2005 by Baglama and Reichel [1] and shows that the method is well suited
for finding the largest singular triplets of very large matrices. The method can easily be
implemented in a variety of programming languages and is computationally fast when
compared to similar methods. We thank Michael Kane and Bryan Lewis for all of their
work in this book and with the software implementations of IRLBA.