This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

ARM Performance library with scikit-learn

I am trying to use ARM PL with scikit-learn to get benefits of some better cases of ARMPL i.e., dgemm.

I linked armpl_mp with numpy,scipy and built scikit-learn. i can see dgemm used from armpl linked numpy and scipy giving better performance compare to default openblas numpy and scipy for dgemm.

So one of scikit-learn algorithm,DBSCAN i tried with armpl as default with openblas it was using the dgemm and taking more time there.

But with armpl it was worsen more as it took 400 times more time. and 99.96% time it is running "armpl::clag::bcms". 

 

can anyone help what this function do and understanding why it taking longer time here?

 + 99.75% 0.00% python3 libarmpl_mp.so [.] armpl::clag::parallel<armpl::◆
+ 99.75% 0.00% python3 libarmpl_mp.so [.] armpl::clag::parallelise_2d<t▒
- 99.75% 99.50% python3 libarmpl_mp.so [.] armpl::clag::bcms<(armpl::cla▒
96.37% thread_start ▒
start_thread ▒
0xffff9c1bb80c ▒
- armpl::clag::parallel<armpl::clag::parallelise_2d<true, true, armpl::clag::resident<(armpl::clag::which_matrix)1▒
- 96.37% armpl::clag::parallelise_2d<true, true, armpl::clag::resident<(armpl::clag::which_matrix)1, armpl::cla▒
armpl::clag::bcms<(armpl::clag::which_matrix)1, double, armpl::clag::convert<double const, double, armpl::▒
+ 96.75% 0.00% python3 libc.so.6 [.] thread_start ▒
+ 96.75% 0.00% python3 libc.so.6 [.] start_thread ▒
+ 96.61% 0.00% python3 libgomp.so.1.0.0 [.] 0x0000ffff9c1bb80c ▒
+ 3.24% 0.00% python3 libomp.so [.] __kmp_invoke_microtask ▒
+ 3.24% 0.00% python3 _base.cpython-310-aarch64-linux-gnu.so [.] .omp_outlined..145 ▒
+ 3.16% 0.02% python3 _radius_neighbors.cpython-310-aarch64-linux-gnu.so [.] __pyx_f_7sklearn_7metrics_29_▒
+ 3.14% 0.00% python3 _middle_term_computer.cpython-310-aarch64-linux-gnu.so [.] __pyx_f_7sklearn_7metrics_29_▒
+ 3.14% 0.00% python3 _cython_blas.cpython-310-aarch64-linux-gnu.so [.] __pyx_fuse_1__pyx_f_7sklearn_▒
+ 3.14% 0.00% python3 libarmpl_mp.so [.] armpl::clag::gemm<true, int, ▒
+ 3.14% 0.00% python3 libarmpl_mp.so [.] _ZZZN5armpl4clag4gemmIdLNS0_4▒
+ 3.14% 0.00% python3 libgomp.so.1.0.0 [.] GOMP_parallel ▒
0.17% 0.10% python3 libomp.so [.] kmp_flag_64<false, true>::wai▒
0.15% 0.15% python3 libarmpl_mp.so [.] dgemm_sve_big  

Machine used : graviton 3

script used :

DBSCAN script :
#
# Imports
#
from sklearn.cluster import DBSCAN
from timeit import default_timer as timer
from sklearn.datasets import make_blobs
import numpy as np

#Generate Dataset

X, y = make_blobs(  n_samples=50000,
                    n_features=100,
                    centers=50,
                    center_box=(-32, 32),
                    shuffle=True,
                    random_state=0  )#
# Main

start = timer()
y_pred = DBSCAN(n_jobs=-1).fit(X, y)
elapsed = timer() - startprint(f"Total Time Taken for Execution of DBSCAN Fit Function: {elapsed} sec/s")