This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

ARM Performance library with scikit-learn

I am trying to use ARM PL with scikit-learn to get benefits of some better cases of ARMPL i.e., dgemm.

I linked armpl_mp with numpy,scipy and built scikit-learn. i can see dgemm used from armpl linked numpy and scipy giving better performance compare to default openblas numpy and scipy for dgemm.

So one of scikit-learn algorithm,DBSCAN i tried with armpl as default with openblas it was using the dgemm and taking more time there.

But with armpl it was worsen more as it took 400 times more time. and 99.96% time it is running "armpl::clag::bcms". 

 

can anyone help what this function do and understanding why it taking longer time here?

 + 99.75% 0.00% python3 libarmpl_mp.so [.] armpl::clag::parallel<armpl::◆
+ 99.75% 0.00% python3 libarmpl_mp.so [.] armpl::clag::parallelise_2d<t▒
- 99.75% 99.50% python3 libarmpl_mp.so [.] armpl::clag::bcms<(armpl::cla▒
96.37% thread_start ▒
start_thread ▒
0xffff9c1bb80c ▒
- armpl::clag::parallel<armpl::clag::parallelise_2d<true, true, armpl::clag::resident<(armpl::clag::which_matrix)1▒
- 96.37% armpl::clag::parallelise_2d<true, true, armpl::clag::resident<(armpl::clag::which_matrix)1, armpl::cla▒
armpl::clag::bcms<(armpl::clag::which_matrix)1, double, armpl::clag::convert<double const, double, armpl::▒
+ 96.75% 0.00% python3 libc.so.6 [.] thread_start ▒
+ 96.75% 0.00% python3 libc.so.6 [.] start_thread ▒
+ 96.61% 0.00% python3 libgomp.so.1.0.0 [.] 0x0000ffff9c1bb80c ▒
+ 3.24% 0.00% python3 libomp.so [.] __kmp_invoke_microtask ▒
+ 3.24% 0.00% python3 _base.cpython-310-aarch64-linux-gnu.so [.] .omp_outlined..145 ▒
+ 3.16% 0.02% python3 _radius_neighbors.cpython-310-aarch64-linux-gnu.so [.] __pyx_f_7sklearn_7metrics_29_▒
+ 3.14% 0.00% python3 _middle_term_computer.cpython-310-aarch64-linux-gnu.so [.] __pyx_f_7sklearn_7metrics_29_▒
+ 3.14% 0.00% python3 _cython_blas.cpython-310-aarch64-linux-gnu.so [.] __pyx_fuse_1__pyx_f_7sklearn_▒
+ 3.14% 0.00% python3 libarmpl_mp.so [.] armpl::clag::gemm<true, int, ▒
+ 3.14% 0.00% python3 libarmpl_mp.so [.] _ZZZN5armpl4clag4gemmIdLNS0_4▒
+ 3.14% 0.00% python3 libgomp.so.1.0.0 [.] GOMP_parallel ▒
0.17% 0.10% python3 libomp.so [.] kmp_flag_64<false, true>::wai▒
0.15% 0.15% python3 libarmpl_mp.so [.] dgemm_sve_big  

Machine used : graviton 3

script used :

DBSCAN script :
#
# Imports
#
from sklearn.cluster import DBSCAN
from timeit import default_timer as timer
from sklearn.datasets import make_blobs
import numpy as np

#Generate Dataset

X, y = make_blobs(  n_samples=50000,
                    n_features=100,
                    centers=50,
                    center_box=(-32, 32),
                    shuffle=True,
                    random_state=0  )#
# Main

start = timer()
y_pred = DBSCAN(n_jobs=-1).fit(X, y)
elapsed = timer() - startprint(f"Total Time Taken for Execution of DBSCAN Fit Function: {elapsed} sec/s")

Parents
  • Hi.

    Glad to hear that it was simple to fix!

    For reference, if you already have armflang installed you have a compatible ArmPL installed as part of that package, as well as a GCC-compatible one.  In fact using "armflang -armpl" would do all the necessary linking as well for you.

    Nice to know you see an improvement over OpenBLAS as well!  Do let us know if you run into any other oddities.

    Chris

Reply
  • Hi.

    Glad to hear that it was simple to fix!

    For reference, if you already have armflang installed you have a compatible ArmPL installed as part of that package, as well as a GCC-compatible one.  In fact using "armflang -armpl" would do all the necessary linking as well for you.

    Nice to know you see an improvement over OpenBLAS as well!  Do let us know if you run into any other oddities.

    Chris

Children
No data