This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

ARM Performance library with scikit-learn

I am trying to use ARM PL with scikit-learn to get benefits of some better cases of ARMPL i.e., dgemm.

I linked armpl_mp with numpy,scipy and built scikit-learn. i can see dgemm used from armpl linked numpy and scipy giving better performance compare to default openblas numpy and scipy for dgemm.

So one of scikit-learn algorithm,DBSCAN i tried with armpl as default with openblas it was using the dgemm and taking more time there.

But with armpl it was worsen more as it took 400 times more time. and 99.96% time it is running "armpl::clag::bcms".

can anyone help what this function do and understanding why it taking longer time here?

+ 99.75% 0.00% python3 libarmpl_mp.so [.] armpl::clag::parallel<armpl::◆
+ 99.75% 0.00% python3 libarmpl_mp.so [.] armpl::clag::parallelise_2d<t▒
- 99.75% 99.50% python3 libarmpl_mp.so [.] armpl::clag::bcms<(armpl::cla▒
96.37% thread_start ▒
start_thread ▒
0xffff9c1bb80c ▒
- armpl::clag::parallel<armpl::clag::parallelise_2d<true, true, armpl::clag::resident<(armpl::clag::which_matrix)1▒
- 96.37% armpl::clag::parallelise_2d<true, true, armpl::clag::resident<(armpl::clag::which_matrix)1, armpl::cla▒
armpl::clag::bcms<(armpl::clag::which_matrix)1, double, armpl::clag::convert<double const, double, armpl::▒
+ 96.75% 0.00% python3 libc.so.6 [.] thread_start ▒
+ 96.75% 0.00% python3 libc.so.6 [.] start_thread ▒
+ 96.61% 0.00% python3 libgomp.so.1.0.0 [.] 0x0000ffff9c1bb80c ▒
+ 3.24% 0.00% python3 libomp.so [.] __kmp_invoke_microtask ▒
+ 3.24% 0.00% python3 _base.cpython-310-aarch64-linux-gnu.so [.] .omp_outlined..145 ▒
+ 3.16% 0.02% python3 _radius_neighbors.cpython-310-aarch64-linux-gnu.so [.] __pyx_f_7sklearn_7metrics_29_▒
+ 3.14% 0.00% python3 _middle_term_computer.cpython-310-aarch64-linux-gnu.so [.] __pyx_f_7sklearn_7metrics_29_▒
+ 3.14% 0.00% python3 _cython_blas.cpython-310-aarch64-linux-gnu.so [.] __pyx_fuse_1__pyx_f_7sklearn_▒
+ 3.14% 0.00% python3 libarmpl_mp.so [.] armpl::clag::gemm<true, int, ▒
+ 3.14% 0.00% python3 libarmpl_mp.so [.] _ZZZN5armpl4clag4gemmIdLNS0_4▒
+ 3.14% 0.00% python3 libgomp.so.1.0.0 [.] GOMP_parallel ▒
0.17% 0.10% python3 libomp.so [.] kmp_flag_64<false, true>::wai▒
0.15% 0.15% python3 libarmpl_mp.so [.] dgemm_sve_big

Machine used : graviton 3

script used :

DBSCAN script :
#
# Imports
#
from sklearn.cluster import DBSCAN
from timeit import default_timer as timer
from sklearn.datasets import make_blobs
import numpy as np

#Generate Dataset

X, y = make_blobs( n_samples=50000,
                    n_features=100,
                    centers=50,
                    center_box=(-32, 32),
                    shuffle=True,
                    random_state=0 )#
# Main

start = timer()
y_pred = DBSCAN(n_jobs=-1).fit(X, y)
elapsed = timer() - startprint(f"Total Time Taken for Execution of DBSCAN Fit Function: {elapsed} sec/s")

Top replies

Parents

0 Darshan412 over 2 years ago in reply to Chris Goodyer

Thanks for your comments, it gives me idea.

so the issue identified from your questions.

i used clang intital to build the numpy,scipy and scikit-learn but the armpl libs default comes with gcc version. so i rebuild all the package with gcc and threading issue got fixed and i see now it is taking less time compare to openblas,

Thank you for support.
Cancel
Vote up +1 Vote down

Cancel

Reply

0 Darshan412 over 2 years ago in reply to Chris Goodyer

Thanks for your comments, it gives me idea.

so the issue identified from your questions.

i used clang intital to build the numpy,scipy and scikit-learn but the armpl libs default comes with gcc version. so i rebuild all the package with gcc and threading issue got fixed and i see now it is taking less time compare to openblas,

Thank you for support.
Cancel
Vote up +1 Vote down

Cancel

Children

+1 Chris Goodyer over 2 years ago in reply to Darshan412

Hi.

Glad to hear that it was simple to fix!

For reference, if you already have armflang installed you have a compatible ArmPL installed as part of that package, as well as a GCC-compatible one. In fact using "armflang -armpl" would do all the necessary linking as well for you.

Nice to know you see an improvement over OpenBLAS as well! Do let us know if you run into any other oddities.

Chris
Cancel
Vote up +3 Vote down

Cancel