GotoBLAS

GotoBLAS
Original author(s)	Kazushige Goto
Stable release	2-1.13 / February 5, 2010; 9 years ago
Type	Linear algebra library; implementation of BLAS
License	BSD License
Website	www.tacc.utexas.edu/tacc-software/gotoblas2

In scientific computing, GotoBLAS and GotoBLAS2 are open source implementations of the BLAS (Basic Linear Algebra Subprograms) API with many hand-crafted optimizations for specific processor types. GotoBLAS was developed by Kazushige Goto at the Texas Advanced Computing Center. As of 2003^[update], it was used in seven of the world's ten fastest supercomputers.^[1]

GotoBLAS remains available, but development ceased with a final version touting optimal performance on Intel's Nehalem architecture (contemporary in 2008).^[2] OpenBLAS is an actively maintained fork of GotoBLAS, developed at the Lab of Parallel Software and Computational Science, ISCAS.

GotoBLAS was written by Goto during his sabbatical leave from the Japan Patent Office in 2002. It was initially optimized for the Pentium 4 processor and managed to immediately boost the performance of a supercomputer based on that CPU from 1.5 TFLOPS to 2 TFLOPS.^[1] As of 2005^[update], the library was available at no cost for noncommercial use.^[1] A later open source version was released under the terms of the BSD license.

GotoBLAS's matrix-matrix multiplication routine, called GEMM in BLAS terms, is highly tuned for the x86 and AMD64 processor architectures by means of handcrafted assembly code.^[3] It follows a similar decomposition into smaller "kernel" routines that other BLAS implementations use, but where earlier implementations streamed data from the L1 processor cache, GotoBLAS uses the L2 cache.^[3] The kernel used for GEMM is a routine called GEBP, for "General block-times-panel multiply",^[4] which was experimentally found to be "inherently superior" over several other kernels that were considered in the design.^[3]

Several other BLAS routines are, as is customary in BLAS libraries, implemented in terms of GEMM.^[4]

References[edit]

^ ^a ^b ^c John Markoff (28 November 2005). "Writing the fastest code, by hand, for fun". New York Times.
^ "GotoBlas2". Retrieved 28 August 2013.
^ ^a ^b ^c Goto, Kazushige; van de Geijn, Robert A. (2008). "Anatomy of High-Performance Matrix Multiplication". ACM Transactions on Mathematical Software. 34 (3): Article 12, 25 pages. CiteSeerX 10.1.1.111.3873. doi:10.1145/1356052.1356053.
^ ^a ^b Goto, Kazushige; van de Geijn, Robert A. (2008). "High-performance implementation of the level-3 BLAS". ACM Transactions on Mathematical Software. 35 (1): 1–14. doi:10.1145/1377603.1377607.

[nyt-1] John Markoff (28 November 2005). "Writing the fastest code, by hand, for fun". New York Times.

[2] "GotoBlas2". Retrieved 28 August 2013.

[anatomy-3] Goto, Kazushige; van de Geijn, Robert A. (2008). "Anatomy of High-Performance Matrix Multiplication". ACM Transactions on Mathematical Software. 34 (3): Article 12, 25 pages. CiteSeerX 10.1.1.111.3873. doi:10.1145/1356052.1356053.

[level3-4] Goto, Kazushige; van de Geijn, Robert A. (2008). "High-performance implementation of the level-3 BLAS". ACM Transactions on Mathematical Software. 35 (1): 1–14. doi:10.1145/1377603.1377607.

[1]

[2]

[3]

[4]

v t e Numerical linear algebra
Key concepts	Floating point Numerical stability
Problems	Linear equations Matrix decompositions Matrix multiplication (algorithms) Matrix splitting Sparse problems
Hardware	CPU cache TLB Cache-oblivious algorithm SIMD Multiprocessing
Software	BLAS Specialized libraries General purpose software

GotoBLAS

See also[edit]

References[edit]

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Interaction

Tools

Print/export

Languages