Basic Linear Algebra Subprograms(BLAS) and BLAS Like Interface Software(BLIS) are libraries that can accelerate mathematical operations on current CPU microarchitectures.
As a part of the FLAME project, BLIS was introduced to handle the dense linear algebra software stack. The framework was designed to isolate essential kernels of computation that, when optimized, immediately enable optimized implementations of most of its commonly used and computationally intensive operations. BLIS offers enhanced performance for cases of matrix multiplications where the operands are small. BLIS supports both, single and multi-threaded modes of operations.
Oracle in its efforts has optimized the BLIS libraries for exceptional performance on Ampere Altra Family of processors. Let us look at how we can leverage this to our benefit and what sort of performance boost can be expected.
Step 1: Getting BLIS Sources
git clone https://github.com/flame/blis.git
cd blis
git checkout ampere
Step 2: Building BLIS
./QuickStart.sh altramax
#Change ./QuickStart.sh altramax to ./QuickStart.sh altra if building for Ampere Altra #processors
source ./blis_build_altramax.sh
source blis_setenv.sh
export LD_LIBRARY_PATH=<INSTALL_PATH_FROM_STEP1>/lib/altramax
Note: BLIS can be built for OpenMP(default) or pthreads. Details can be found in documentation/tutorial.
Step 3: Performance Experiments
For our test, we will be using HPL- 2.3, a High Performance Linpack benchmark that is commonly used to test systems. We will be comparing performance of Oracle BLIS library with OpenBLAS and Arm-PL
System Config:
OS: Ubuntu 22.04
Kernel: 5.19.0-46-generic
Toolchain: gcc (GCC) 12.3.0
Memory: 16x32GB
Results:
For HPL, the Oracle optimized BLIS libraries provide 1.2 times boost in performance.
Comments